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Preface 


Preface to the Second Edition 


The second edition is revised, expanded and enhanced. This is now a more 
complete text in Stochastic Calculus, from both a theoretical and an appli- 
cations point of view. Changes came about, as a result of using this book 
for teaching courses in Stochastic Calculus and Financial Mathematics over a 
number of years. Many topics are expanded with more worked out examples 
and exercises. Solutions to selected exercises are included. A new chapter 
on bonds and interest rates contains derivations of the main pricing mod- 
els, including currently used market models (BGM). The change of numeraire 
technique is demonstrated on interest rate, currency and exotic options. The 
presentation of Applications in Finance is now more comprehensive and self- 
contained. The models in Biology introduced in the new edition include the 
age-dependent branching process and a stochastic model for competition of 
species. These Markov processes are treated by Stochastic Calculus tech- 
niques using some new representations, such as a relation between Poisson 
and Birth-Death processes. The mathematical theory of filtering is based on 
the methods of Stochastic Calculus. In the new edition, we derive stochastic 
equations for a non-linear filter first and obtain the Kalman-Bucy filter as a 
corollary. Models arising in applications are treated rigorously demonstrating 
how to apply theoretical results to particular models. This approach might 
not make certain places easy reading, however, by using this book, the reader 
will accomplish a working knowledge of Stochastic Calculus. 


Preface to the First Edition 


This book aims at providing a concise presentation of Stochastic Calculus with 
some of its applications in Finance, Engineering and Science. 

During the past twenty years, there has been an increasing demand for tools 
and methods of Stochastic Calculus in various disciplines. One of the greatest 
demands has come from the growing area of Mathematical Finance, where 
Stochastic Calculus is used for pricing and hedging of financial derivatives, 
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such as options. In Engineering, Stochastic Calculus is used in filtering and 
control theory. In Physics, Stochastic Calculus is used to study the effects 
of random excitations on various physical phenomena. In Biology, Stochastic 
Calculus is used to model the effects of stochastic variability in reproduction 
and environment on populations. 

From an applied perspective, Stochastic Calculus can be loosely described 
as a field of Mathematics, that is concerned with infinitesimal calculus on non- 
differentiable functions. The need for this calculus comes from the necessity to 
include unpredictable factors into modelling. This is where probability comes 
in and the result is a calculus for random functions or stochastic processes. 

This is a mathematical text, that builds on theory of functions and prob- 
ability and develops the martingale theory, which is highly technical. This 
text is aimed at gradually taking the reader from a fairly low technical level 
to a sophisticated one. This is achieved by making use of many solved exam- 
ples. Every effort has been made to keep presentation as simple as possible, 
while mathematically rigorous. Simple proofs are presented, but more techni- 
cal proofs are left out and replaced by heuristic arguments with references to 
other more complete texts. This allows the reader to arrive at advanced results 
sooner. These results are required in applications. For example, the change 
of measure technique is needed in options pricing; calculations of conditional 
expectations with respect to a new filtration is needed in filtering. It turns out 
that completely unrelated applied problems have their solutions rooted in the 
same mathematical result. For example, the problem of pricing an option and 
the problem of optimal filtering of a noisy signal, both rely on the martingale 
representation property of Brownian motion. 

This text presumes less initial knowledge than most texts on the subject 
(Métivier (1982), Dellacherie and Meyer (1982), Protter (1992), Liptser and 
Shiryayev (1989), Jacod and Shiryayev (1987), Karatzas and Shreve (1988), 
Stroock and Varadhan (1979), Revuz and Yor (1991), Rogers and Williams 
(1990)), however it still presents a fairly complete and mathematically rigorous 
treatment of Stochastic Calculus for both continuous processes and processes 
with jumps. 

A brief description of the contents follows (for more details see the Table 
of Contents). The first two chapters describe the basic results in Calculus and 
Probability needed for further development. These chapters have examples but 
no exercises. Some more technical results in these chapters may be skipped 
and referred to later when needed. 

In Chapter 3, the two main stochastic processes used in Stochastic Calculus 
are given: Brownian motion (for calculus of continuous processes) and Poisson 
process (for calculus of processes with jumps). Integration with respect to 
Brownian motion and closely related processes (It6 processes) is introduced 
in Chapter 4. It allows one to define a stochastic differential equation. Such 
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equations arise in applications when random noise is introduced into ordinary 
differential equations. Stochastic differential equations are treated in Chapter 
5. Diffusion processes arise as solutions to stochastic differential equations, 
they are presented in Chapter 6. As the name suggests, diffusions describe a 
real physical phenomenon, and are met in many real life applications. Chapter 
7 contains information about martingales, examples of which are provided by 
Itô processes and compensated Poisson processes, introduced in earlier chap- 
ters. The martingale theory provides the main tools of stochastic calculus. 
These include optional stopping, localization and martingale representations. 
These are abstract concepts, but they arise in applied problems, where their 
use is demonstrated. Chapter 8 gives a brief account of calculus for most 
general processes, called semimartingales. Basic results include Itô’s formula 
and stochastic exponential. The reader has already met these concepts in 
Brownian motion calculus given in Chapter 4. Chapter 9 treats Pure Jump 
processes, where they are analyzed by using compensators. The change of 
measure is given in Chapter 10. This topic is important in options pric- 
ing, and for inference for stochastic processes. Chapters 11-14 are devoted 
to applications of Stochastic Calculus. Applications in Finance are given in 
Chapters 11 and 12, stocks and currency options (Chapter 11); bonds, inter- 
est rates and their options (Chapter 12). Applications in Biology are given 
in Chapter 13. They include diffusion models, Birth-Death processes, age- 
dependent (Bellman-Harris) branching processes, and a stochastic version of 
the Lotka-Volterra model for competition of species. Chapter 14 gives ap- 
plications in Engineering and Physics. Equations for a non-linear filter are 
derived, and applied to obtain the Kalman-Bucy filter. Random perturba- 
tions to two-dimensional differential equations are given as an application in 
Physics. Exercises are placed at the end of each chapter. 

This text can be used for a variety of courses in Stochastic Calculus and 
Financial Mathematics. The application to Finance is extensive enough to 
use it for a course in Mathematical Finance and for self study. This text is 
suitable for advanced undergraduate students, graduate students as well as 
research workers and practioners. 
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Chapter 1 


Preliminaries From 
Calculus 


Stochastic calculus deals with functions of time t, 0 < t < T. In this chapter 
some concepts of the infinitesimal calculus used in the sequel are given. 


1.1 Functions in Calculus 


Continuous and Differentiable Functions 


A function g is called continuous at the point t = to if the increment of g over 
small intervals is small, 


Ag(t) = g(t) — g(to) > 0 as At = t — to > 0. 


If g is continuous at every point of its domain of definition, it is simply 
called continuous. 

g is called differentiable at the point t = to if at that point 

Ag(t) 
Ag ~ CAt lim —— =C, 
z O Atco At 

this constant C is denoted by g'(to). If g is differentiable at every point of its 
domain, it is called differentiable. 

An important application of the derivative is a theorem on finite incre- 
ments. 


Theorem 1.1 (Mean Value Theorem) Jf f is continuous on [a,b] and has 
a derivative on (a,b), then there isc, a < c< b, such that 


FŒ) — fla) = F'(c)(b — a). (1.1) 
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Clearly, differentiability implies continuity, but not the other way around, 
as continuity states that the increment Ag converges to zero together with 
At, whereas differentiability states that this convergence is at the same rate 
or faster. 


Example 1.1: The function g(t) = vt is not differentiable at 0, as at this point 


Ag Q VAt o 1 
At At /At 


as t > 0. 


It is surprisingly difficult to construct an example of a continuous function 
which is not differentiable at any point. 


Example 1.2: An example of a continuous, nowhere differentiable function was 
given by the Weierstrass in 1872: for 0 < t < 27 


> cos(3”t) 

f) = Se (1.2) 
n=1 

We don’t give a proof of these properties, a justification for continuity is given 

by the fact that if a sequence of continuous functions converges uniformly, then the 

limit is continuous; and a justification for non-differentiability can be provided in 


some sense by differentiating term by term, which results in a divergent series. 


To save repetition the following notations are used: a continuous function f 
is said to be a C function; a differentiable function f with continuous derivative 
is said to be a C! function; a twice differentiable function f with continuous 
second derivative is said to be a C? function; etc. 


Right and Left-Continuous Functions 


We can rephrase the definition of a continuous function: a function g is called 
continuous at the point t = to if 
lim g(t) = g(to), (1.3) 
t—>to 
it is called right-continuous (left-continuous) at to if the values of the function 
g(t) approach g(to) when t approaches to from the right (left) 
li t) = li t) = g(to). 1.4 
lim g(t) = g(to), (lim g(t) = g(to)-) (1.4) 
If g is continuous it is, clearly, both right and left-continuous. 
The left-continuous version of g, denoted by g(t—), is defined by taking left 


limit at each point, 
g(t—) = lim g(s). (1.5) 


sft 
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From the definitions we have: g is left-continuous if g(t) = g(t—). 
The concept of g(t+) is defined similarly, 


g(t+) = i g(s). (1.6) 


If g is a right-continuous function then g(t+) = g(t) for any t, so that g+ = g. 


Definition 1.2 A point t is called a discontinuity of the first kind or a jump 
point if both limits g(t+) and g(t—) exist and are not equal. The jump at t is 
defined as Ag(t) = g(t+) — g(t—). Any other discontinuity is said to be of the 
second kind. 


Example 1.3: The function sin(1/t) for t # 0 and 0 for t = 0 has discontinuity of 
the second kind at zero, because the limits from the right or the left don’t exist. 


An important result is that a function can have at most countably many 
jump discontinuities (see for example Hobson (1921), p.286). 


Theorem 1.3 A function defined on an interval [a,b] can have no more than 
countably many jumps. 


A function, of course, can have more than countably many discontinuities, but 
then they are not all jumps, i.e. would not have limits from right or left. 

Another useful result is that a derivative cannot have jump discontinuities 
at all. 


Theorem 1.4 Jf f is differentiable with a finite derivative f’(t) in an interval, 
then at all points f'(t) is either continuous or has a discontinuity of the second 


kind. 


PROOF: If tis such that f’(t+) = limg); f’(s) exists (finite or infinite), then 
by the Mean Value Theorem the same value is taken by the derivative from 
the right 


Fo = F). 


— 1m 
Ato A Al\0,0<c<A 


Similarly for the derivative from the left, f’(¢) = f’(t—). Hence f’(£) is con- 
tinuous at t. The result follows. 


This result explains why functions with continuous derivatives are sought as 
solutions to ordinary differential equations. 
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Functions considered in Stochastic Calculus 


Functions considered in stochastic calculus are functions without discontinu- 
ities of the second kind, that is functions that have both right and left limits 
at any point of the domain and have one-sided limits at the boundary. These 
functions are called regular functions. It is often agreed to identify functions 
if they have the same right and left limits at any point. 

The class D = D[0,T] of right-continuous functions on [0,T] with left 
limits has a special name, càdlàg functions (which is the abbreviation of “right 
continuous with left limits” in French). Sometimes these processes are called 
R.R.C. for regular right continuous. Notice that this class of processes includes 
C, the class of continuous functions. 

Let g € D be a cadlag function, then by definition, all the discontinuities 
of g are jumps. By Theorem 1.3 such functions have no more than countably 
many discontinuities. 


Remark 1.1: In stochastic calculus Ag(t) usually stands for the size of the 
jump at t. In standard calculus Ag(t) usually stands for the increment of g 
over [t,t + A], Ag(t) = g(t + A) — g(t). The meaning of Ag(t) will be clear 
from the context. 


1.2 Variation of a Function 


If g is a function of real variable, its variation over the interval [a,b] is defined 
as 


n 
Vy (la, b]) = sup >> |g(t?) — g (ti )l; (1.7) 
i=1 
where the supremum is taken over partitions: 
a=t <t] <... < tp =b. (1.8) 
Clearly, (by the triangle inequality) the sums in (1.7) increase as new points 
are added to the partitions. Therefore variation of g is 


V,({a, b]) = im > l9(t?) — ota), (1.9) 


where 6, = maxi<i<n(ti — ti-1). If Vg({a,6]) is finite then g is said to be 
a function of finite variation on [a,b]. If g is a function of t > 0, then the 
variation function of g as a function of t is defined by 


V,(t) = V4 (l0, €]). 


Clearly, V,(t) is a non-decreasing function of t. 
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Definition 1.5 g is of finite variation if V,(t) < œ for all t. g is of bounded 
variation if sup, Vg(t) < œ, in other words, if for all t, V(t) < C, a constant 
independent of t. 


Example 1.4: 


1. If g(t) is increasing then for any i, g(ti) > g(ti-1) resulting in a telescoping 
sum, where all the terms excluding the first and the last cancel out, leaving 


Va(t) = g(t) — g(0). 
2. If g(t) is decreasing then, similarly, 
Va (t) = g(0) — g(t). 


ee 1.5: If g(t) is differentiable with continuous derivative g'(t), g(t) = 
fe s)ds, and J |g'(s)|ds < oo, then 


= f woas 


This can be seen by using the definition and the mean value theorem. j g'(s)ds = 
4—1 


g' (i) (ti — ti-1), for some £; € (ti-1,ti). Thus Lees ‘(s)ds| = |g’ (&)|(ti — ti-1), 
and 


VO =m al) — tea = tin D1 fo (s)ds| 


i=1 o 


The last equality is due to the last sum being a Riemann sum for the final integral. 
Alternatively, the result can be seen from the decomposition of the derivative 
into the positive and negative parts, 


a= f atoas= f Woas- f Was 


Notice that [g'(s)]7 is zero when [g’(s)]* is positive, and the other way around. Using 
this one can see that the total variation of g is given by the sum of the variation of 
the above integrals. But these integrals are monotone functions with the value zero 


at zero. Hence 
[ae yrds f WO 


[wor +n ts = f we Ids. 


V(t) 
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Example 1.6: (Variation of a pure jump function). 
If g is a regular right-continuous (cadlag) function or regular left-continuous (caglad), 
and changes only by jumps, 


g(t) = $. Ag(s), 
O<s<t 
then it is easy to see from the definition that 


Va(t)= $ |Ag(s)|- 


O<s<t 


Example 1.7: The function g(t) = tsin(1/t) for t > 0, and g(0) = 0 is continuous 
on [0,1], differentiable at all points except zero, but has infinite variation on any 
interval that includes zero. Take the partition 1/(27k + 7/2),1/(27k — 7/2), k = 
13 oc 


The following theorem gives necessary and sufficient conditions for a func- 
tion to have finite variation. 


Theorem 1.6 (Jordan Decomposition) Any function g: [0,œ)— R of 
finite variation can be expressed as the difference of two increasing functions 


g(t) = a(t) — b(t). 


One such decomposition is given by 
a(t) = V(t) b(t) = Vo(t) — g(t). (1.10) 


It is easy to check that b(t) is increasing, and a(t) is obviously increasing. The 
representation of a function of finite variation as difference of two increasing 
functions is not unique. Another decomposition is 


a(t) = 5(Valt) +90) — FV) — 90). 


The sum, the difference and the product of functions of finite variation are also 
functions of finite variation. This is also true for the ratio of two functions 
of finite variation provided the modulus of the denominator is larger than a 
positive constant. 

The following result follows by Theorem 1.3, and its proof is easy. 


Theorem 1.7 A finite variation function can have no more than countably 
many discontinuities. Moreover, all discontinuities are jumps. 
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PROOF: It is enough to establish the result for monotone functions, since a 
function of finite variation is a difference of two monotone functions. 

A monotone function has left and right limits at any point, therefore any 
discontinuity is a jump. The number of jumps of size greater or equal to + is 
no more than (g(b) — g(a))n. The set of all jump points is a union of the sets 
of jump points with the size of the jumps greater or equal to L, Since each 
such set is finite, the total number of jumps is at most countable. 


A sufficient condition for a continuous function to be of finite variation is 
given by the following theorem, the proof of which is outlined in Example 1.5. 


Theorem 1.8 If g is continuous, g' exists and f|g'(t)|dt < œo then g is of 
finite variation. 


Theorem 1.9 (Banach) Let g(t) be a continuous function on [0,1], and de- 
note by s(a) the number of t’s with g(t) = a. Then the variation of g is 


J, s(a)da. 


Continuous and Discrete Parts of a Function 


Let g(t), t > 0, be a right-continuous increasing function. Then it can have 
at most countably many jumps, moreover the sum of the jumps is finite over 
finite time intervals. Define the discontinuous part g? of g by 


g(t) =X (gls) — g(s-)) = So Ag(s), (1.11) 


s<t O<s<t 


and the continuous part g° of g by 
9° (t) = g(t) — g(t). (1.12) 


Clearly, g? changes only by jumps, g° is continuous and g(t) = g°(t) + g4(t). 
Since a finite variation function is the difference of two increasing functions, 
the decomposition (1.12) holds for functions of finite variation. Although rep- 
resentation as the difference of increasing functions is not unique, decompo- 
sition (1.12) is essentially unique, in a sense that any two such decomposi- 
tion differ by a constant. Indeed, if there were another such decomposition 
g(t) = R(t) +h4(t), then h°(t) — g°(t) = g4(t) —h4@(t), implying that h?—g? is 
continuous. Hence h? and g? have the same set of jump points, and it follows 
that h(t) — g4(t) = c for some constant c. 
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Quadratic Variation 


If g is a function of real variable, define its quadratic variation over the interval 
(0, t] as the limit (when it exists) 


n 


[g](t) = Jim Yi?) — glia), (1.13) 


i=l 


where the limit is taken over partitions: 0 = tọ < t? < ... < tz = t, with 
On = maxı<i<n(t? = tba). 


Remark 1.2: Similarly to the concept of variation, there is a concept of ®- 
variation of a function. If ®(u) is a positive function, increasing monotonically 
with u then the ®-variation of g on [0, t] is 


Volg] = sup } (g(t?) — gti), (1.14) 


where supremum is taken over all partitions. Functions with finite -variation 
on [0, t] form a class Vs. With (u) = u one obtains the class V F of functions 
of finite variation, with ®(u) = uP” one obtains the class of functions of p-th 
finite variation, V Fp. If 1 < p < q < œœ, then finite p-variation implies finite 
q-variation. 

The stochastic calculus definition of quadratic variation is different to the 
classical one with p = 2 (unlike for the first variation p = 1, when they are 
the same). In stochastic calculus the limit in (1.13) is taken over shrinking 
partitions with 6, = max)<j<n(t? — t?_,) — 0, and not over all possible 
partitions. We shall use only the stochastic calculus definition. 


Quadratic variation plays a major role in stochastic calculus, but is hardly 
ever met in standard calculus due to the fact that smooth functions have zero 
quadratic variation. 


Theorem 1.10 If g is continuous and of finite variation then its quadratic 
variation is zero. 


PROOF: 
n—l1 
IO) = lim X Ot) - gt)? 
= i=0 
n-1 
< Jim max lgl) — gr) N latha) — IE) 
T i=0 


IA 
a 
3 oy 
LE 
tal 
eS 
T 
Ss 


11) — Itr Va). 
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Since g is continuous, it is uniformly continuous on [0, t], hence 
lims, 0 max; |g(t?,,) — g(t7 )| = 0, and the result follows. 


Remark that there are functions with zero quadratic variation and infinite 
variation (called functions of zero energy). 

Define the quadratic covariation (or simply covariation) of f and g on [0,t 
by the following limit (when it exists) 


n-1 


fg] @) = Jim do (FR) — FOP) (92a) E) (1.15) 


i=0 
when the limit is taken over partitions {t7 } of [0, t] with ôn = max; (tp 1 — tp). 
The same proof as for Theorem 1.10 works for the following result 


Theorem 1.11 Jf f is continuous and g is of finite variation, then their co- 
variation is zero |f, g] (t) = 0. 


Let f and g be such that their quadratic variation is defined. By using 
simple algebra, one can see that covariation satisfies 


Theorem 1.12 (Polarization identity) 


[f, 9] (t) = staf +a] O -AAO — [g, 91 0) (1.16) 


It is obvious that covariation is symmetric, [f, g] (t) = [g, f] (t), it follows 
from(1.16) that it is linear, that is, for any constants a and 8 


laf + Bg, h] (t) = a[f, h] (t) + 8 lg, Al (t). (1.17) 


Due to symmetry it is bilinear, that is, linear in both arguments. Thus the 
quadratic variation of the sum can be opened similarly to multiplication of 
sums (aif + G1g)(agh + Bok). It follows from the definition of quadratic 
variation, that it is a non-decreasing function in t, and consequently it is 
of finite variation. By the polarization identity, covariation is also of finite 
variation. More about quadratic variation is given in the Stochastic Calculus 
chapter. 


1.3 Riemann Integral and Stieltjes Integral 


Riemann Integral 


The Riemann Integral of f over interval [a, b] is defined as the limit of Riemann 
sums 


b n 
J fide = lim SY fencer = ea), (1.18) 
a i=1 
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where t?’’s represent partitions of the interval, 


a= th <th <...<th=b,5= max (tf - #71), and tL, << th. 
<i<n 


It is possible to show that Riemann Integral is well defined for continuous 
functions, and by splitting up the interval, it can be extended to functions 
which are discontinuous at finitely many points. 

Calculation of integrals is often done by using the antiderivative, and is 
based on the the following result. 


Theorem 1.13 (The fundamental theorem of calculus) Jf f is differen- 
tiable on [a,b] and f' is Riemann integrable on [a,b] then 


b 
f(b) - fla) = i: f'(s)ds. 


In general, this result cannot be applied to discontinuous functions, see exam- 
ple below. For such functions a jump term must be added, see (1.20). 


Example 1.8: Let f(t) = 2 for 1 < t < 2, f(t) =1 for0 <t <1. Then f’(t) =0 
at allt A 1. J f'(s)ds = 0 Æ f(t). f is continuous and is differentiable at all points 
but one, the derivative is integrable, but the function does not equal the integral of 
its derivative. 


Main tools for calculations of Riemann integrals are change of variables and 
integration by parts. These are reviewed below in a more general framework 
of the Stieltjes integral. 


Stieltjes Integral 


The Stieltjes Integral is an integral of the form f? f(t)dg(t), where g is a 
function of finite variation. Since a function of finite variation is a difference 
of two increasing functions, it is sufficient to define the integral with respect 
to monotone functions. 


Stieltjes Integral with respect to Monotone Functions 


The Stieltjes Integral of f with respect to a monotone function g over an 
interval (a, b] is defined as 


b b n 
J tas= f OO = m > Heo?) 9th), 19) 


with the quantities appearing in the definition being the same as above for the 
Riemann Integral. This integral is a generalization of the Riemann Integral, 
which is recovered when we take g(t) = t. This integral is also known as the 
Riemann-Stieltjes integral. 
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Particular Cases 


If g'(t) exists, and g(t ) + fog s)ds, then it is possible to show that 


Pn )dg(t) Pn 


If g(t) = Si, h(k) (a integer, and [t] stands for the integer part of t) then 


[ rowo = Sst 


This property allows us to represent sums as integrals. 


Example 1.9: 
1. g(t) = 2¢? S? f(ag(t) = 4 f’ tf (at 
0 t<0 
2 0<t<1 
2 ID= 3 oe pa 
5 2<t 


J tOdO = 20) + 50) +20) 
If, for example, f(t) =t me JE tdg(t) = 5. If f(t) = (t+ 1)? 
then f° (t +1)°dg(t) = 2 +4 + 18 = 24. 
Let g be a function of finite variation and 
g(t) = a(t) — b(t) 


with a(t) = V,(t), b(t) = V,(t) — g(t), which are non-decreasing functions. If 


fit )|da(s) = [it )laV,(s = fit) JIldg(s)| < o% 


then f is Stieltjes-integrable with respect to g and its integral is defined by 


z i f(s)da(s)— | f(s)db(s). 
(0,¢] 


(0,¢] (0,¢] 


Notation: f? f(s)dg(s) = fia f(s)d9(s). 

Note: fio 4 da(s) = g(t) — g(0) and fio d9(s) = g(t-) — 9(0). 

If f is Stieltjes-integrable with respect to a function g of finite variation, 
then the variation of the integral is 


= T IF (s)lldg(s)| = f I#(s)ldV(5) 
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Impossibility of a direct definition of an integral with respect to 
functions of infinite variation 


In stochastic calculus we need to consider integrals with respect to functions of 
infinite variation. Such functions arise, for example, as models of stock prices. 
Integrals with respect to a function of infinite variation, cannot be defined as 
a usual limit of approximating sums. The following result explains, see for 
example, Protter (1992), p.40. 


Theorem 1.14 Let ôn = max;(t? — t?_,) denote the largest interval in the 
partition of [a,b]. If 


jim, 2 FED) — 91) 


exists for any continuous function f then g must be of finite variation on [a,b]. 
This shows that if g has infinite variation then the limit of the approximating 
sums does not exist for some functions f. 

Integration by Parts 


Let f and g be functions of finite variation. Denote here Ag(s) = g(s) — g(s—), 
then (with integrals on (a, bJ) 


b b 
FOO- Fla)gla) =f Hda) + foad) YY AF) Ags) 
a g a<s<b 
b b 
= f K-a) f aod) (1.20) 
The last equation is obtained by putting together the sum of jumps with one 
of the integrals. 

Note that although the sum in (1.20) is written over uncountably many 
values a < s < b, it has at most countably many non-zero terms. This is 
because a finite variation function can have at most a countable number of 
jumps. 

If g is continuous so that g(s—) = g(s) for all s then the formula simplifies 
and in this case we have the familiar integration by parts formula 

b 


b 
rO- Foa = f fis)dg(s) + | g(s)df(s). 
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Example 1.10: Let g(s) be of finite variation, g(0) = 0, and consider g?(s). Using 
the integration by parts with f = g, we have 


PO=2 | geat Yaa) 
0 


s<t 


In other words, 


J ae aus) = 2 - Foals 


o s<t 


Now using the formula (1.20) above we also have 
t t 2 
ao g (t) i 1 2 
[ oouo =a - f aos Ea 


Thus it follows that 


Change of Variables 


Let f have a continuous derivative (f € C1) and g be of finite variation and 
continuous, then 


t g(t) 
Flat) ~ SOO) = f PUO =f Fw 


If g is of finite variation has jumps, and is right-continuous then 
t 


Hal) — FG) = | #(a(s-))a9(s) 
+ D (F06) - fals-)) - F@(s-))a(s)), 


where Ag(s) = g(s) — g(s—) denotes the jump of g at s. This is known in 
stochastic calculus as It6’s formula. 


Example 1.11: Take f(x) = x”, then we obtain 
t 
(0) ~ 90) =2 | g-as) + Fat) 
o s<t 


Remark 1.3: Note that for a continuous f and finite variation g on [0, t] the 
approximating sums converge as 6 = max;(t?,, — t?) — 0, 


SE Hlocerniotet)— 914) > f Hols—Nagt. 
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Remark 1.4: One of the shortcomings of Riemann or Stieltjes integrals is that 
they don’t preserve the monotone convergence property, that is, for a sequence 
of functions fn T f does not necessarily follow that their integrals converge. 
The Lebesgue (or Lebesgue-Stieltjes) Integral preserves this property. 


1.4 Lebesgue’s Method of Integration 


While Riemann sums are constructed by dividing the domain of integration 
on the x-axis, the interval [a,b], into smaller subintervals, Lebesgue sums are 
constructed by dividing the range of the function on the y-axis, the interval 
[c,d], into smaller subintervals c = yo < yı < ... < Yk < ---Yn = d and 


forming sums 
n—-1 


X yelength({t : yk < f(t) < yry H). 

k=0 
The Lebesgue Integral is the limit of the above sums as the number of points 
in the partition increases. It turns out that the Lebesgue Integral is more 
general than the Riemann Integral, and preserves convergence. This approach 
also allows integration of functions in abstract probability spaces more general 
than R or R”; it requires additional concepts and is made more precise in 
the next chapter (see Section 2.3). 


Remark 1.5: In folklore the following analogy is used. Imagine that money 
is spread out on a floor. In the Riemann method of integration, you collect 
the money as you progress in the room. In the Lebesgue method, first you 
collect $100 bills everywhere you can find them, then $50, etc. 


1.5 Differentials and Integrals 


The differential df(t) of a differentiable function f at t is defined as the linear 
in At part of the increment at t, f(t + A) — f(t). If the differential of the 
independent variable is denoted dt = At, then f(t + dt) — f(t) = df(t)+ 
smaller order terms, and it follows from the existence of the derivative at t, 
that 


df(t) = f'(t)dt. (1.21) 
If g is also a differentiable function of t, then f(g(t)) is differentiable, and 
df(g(t)) = FOE) Edt = f'(g(t))dg(t), (1.22) 


which is known as the chain rule. 
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Differential calculus is important in applications because many physical 
problems can be formulated in terms of differential equations. The main re- 
lation between the integral and the differential (or derivative) is given by the 
fundamental theorem of calculus, Theorem 1.13. 

For differentiable functions, differential equations of the form 


df(t) = p(t)dw(t) 


can be written in the integral form 


(0) = 00) + | olja). 


In Stochastic Calculus stochastic differentials do not formally exist and the 
random functions w(t) are not differentiable at any point. By introducing a 
new (stochastic) integral, stochastic differential equations can be defined, and, 
by definition, solutions to these equations are given by the solutions to the 
corresponding stochastic integral equations. 


1.6 Taylor’s Formula and Other Results 


This section contains Taylor’s Formula and conditions on functions used in 
results on differential equations. It may be treated as an appendix, and referred 
to only when needed. 


Taylor’s Formula for Functions of One Variable 


If we consider the increment of a function f(x) — f(xo) over the interval [xo, x], 
then provided f’(ao) exists, the differential at xo is the linear part in (x — xo) 
of this increment and it provides the first approximation to the increment. 
Taylor’s formula gives a better approximation by taking higher order terms 
of powers of (x — xo) provided higher derivatives of f at xo exist. If f isa 
function of x with derivatives up to order n+ 1, then 


f(x) - f(z) = f' %0)(a ~ 0) + sf" (a0)(@ — 20)? + È F (a0) 0 — zo)? 
+ at =F (ao)(2 — xo)” + Rn(z, £o), 


where Rn is the remainder, and f™ is the derivative of f7. The remainder 
can be written in the form 


1 
(n+ 1)! 


Rn (2,20) = FFD (On)(a — a0)"** 


16 CHAPTER 1. PRELIMINARIES FROM CALCULUS 


for some point 0, € (ao, £). 
In our applications we shall use this formula with two terms. 


f(a) - f(a) = F(@)(@— 20) FOE- 20)?, (1.23) 
for some point 6 € (xo, x). 


Taylor’s Formula for Functions of Several Variables 


Similarly to the one-dimensional case, Taylor’s formula gives successive ap- 
proximations to the increment of a function. A function of n real variables 
f(@1,%2,...,2n) is differentiable at point x = (x1, £2, ... , £n) if the increment 
at this point can be approximated by a linear part, which is the differential of 
f at a. 


X (Arz;)? and lim ND) iy (1.24) 
i=l Pa p 


Af(ax) = bD C:Axi+o(p), when p = 


isi 


If f is differentiable at x = (x1, £2,..., £n), then in particular it is differen- 
tiable as a function of any one variable x; at that point, when all the other 
coordinates are kept fixed. The derivative with respect to x; is the called the 
partial derivative Of /Ox;. Unlike in the one-dimensional case, the existence 
of all partial derivatives Of /Ox; at x, is necessary but not sufficient for differ- 
entiability of f at x. But if all partial derivatives exist and are continuous at 
that point, then f is differentiable at that point, moreover, C; in (1.24) is given 
by the value of Of /Ox; at x. If we define the differential of the independent 
variable as its increment dx; = Az;, then we have 


Theorem 1.15 For f to be differentiable at a point, it is necessary that f 
has partial derivatives at that point, and it is sufficient that it has continuous 
partial derivatives at that point. If f is differentiable at x, then its differential 
at x is given by 


of 
Ox; 


OF Biss gn = X (£1, £2,.. . , En )dzi. (1.25) 
i=l 


The first approximation of the increment of a differentiable function is the 
differential, 


Af (x) ~ df(a). 
If f possesses higher order partial derivatives, then further approximation is 


possible and it is given by Taylor’s formula. In Stochastic Calculus the second 
order approximation plays an important role. 
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Let f: R” > R be C?, (f(£1,£2,-.., £n) has continuous partial deriva- 
tives up to order two), x = (#1,%2,..-,%n), X + Ax = (zı + Azı, £2 + 
AT2,..., Zn + Azn) then by considering the function of one variable 
g(t) = f(x + tAx) for 0 < t < 1, the following result is obtained. 

Af (a1, 2,.--,%n) = f(x + Ax) — f(x) & y ——(#1,%2,...,%n) dx; 


P 
i=1 Ox; 


IAA a9? 
+ sae +6Ax),...,%, + 0Ax,)dx;dz;, (1.26) 


rj; OX; 


where just like in the case of one variable the second derivatives are evaluated 
at some “middle” point, (xı + 0Az1,..., £n +0Az,,) for some 8 € (0,1), and 


Lipschitz and Holder Conditions 


Lipschitz and Holder conditions describe subclasses of continuous functions. 
They appear as conditions on the coefficients in the results on the existence 
and uniqueness of solutions of ordinary and stochastic differential equations. 


Definition 1.16 f satisfies a Holder condition (Holder continuous) of order 
a, 0 <a <1, on [a,b] (R) if there is a constant K > 0, so that for all 
x,y € [a,b] (R) 


If(x) — f@)| < Kle — y|“. (1.27) 
A Lipschitz condition is a Hölder condition with a = 1, 
If(x)-— f(y)| < Kla— yl. (1.28) 


It is easy to see that a Hölder continuous of order a function on [a,b] is also 
Holder continuous of any lesser order. 


Example 1.12: The function f(x) = æ on [0,co) is Hélder continuous with 
a = 1/2 but is not Lipschitz, since its derivative is unbounded near zero. To see that 
it is Holder, it is enough to show that for all x,y > 0 the following ratio is bounded, 


<< > a SR (1.29) 
ry 


It is an elementary exercise to establish that the left hand side is bounded by dividing 
through by \/y (if y = 0, then the bound is obviously one), and applying |’H6pital’s 
rule. Similarly |x|", 0 < r < 1 is Holder of order r. 


A simple sufficient condition for a function to be Lipschitz is to be contin- 
uous and piecewise smooth, precise definitions follow. 
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Definition 1.17 f is smooth on [a,b] if it possesses a continuous derivative 
f' on (a,b) such that the limits f'(a+) and f’(b—) exist. 


Definition 1.18 f is piecewise continuous on [a,b] if it is continuous on [a, b] 
except possibly a finite number of points at which right-hand and left-hand 
limits exist. 


Definition 1.19 f is piecewise smooth on [a,b] if it is piecewise continuous 
on [a,b] and f’ exists and is also piecewise continuous on [a,b]. 


Growth Conditions 


Linear growth condition also appears in the results on existence and uniqueness 
of solutions of differential equations. f(x) satisfies the linear growth condition 
if 

If(@)| < KA + |z). (1.30) 
This condition describes the growth of a function for large values of x, and 
states that f is bounded for small values of x. 


Example 1.13: It can be shown that if f(0,t) is a bounded function of t, |f (0, t)| < 
C for all t, and f(x,t) satisfies the Lipschitz condition in x uniformly in t, 

|f(x,t) — f(y, t)| < K|z — y|, then f(x,t) satisfies the linear growth condition in z, 
|f(x,t)| < Ki(1 + |z\). 


The polynomial growth condition on f is the condition of the form 
F| < K+ |z|”), for some K,m > 0. (1.31) 


Theorem 1.20 (Gronwall’s inequality) Let f(t), g(t) and h(t) be 
non-negative on [0,T], and for all 0 < t < T 
t 
F(t) < oft) | his) s)ds (1.32) 


0 


Then for0<t<T 


F) < gt) +f h(s)g(s) exp (f h(u)du) ds. (1.33) 


This form can be found for example, in Dieudonné (1960). 
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Solution of First Order Linear Differential Equations 


Linear differential equations, by definition, are linear in the unknown function 


and its derivatives. A first order linear equation, in which the coefficient of 


date) does not vanish, can be written in the form 


+ g(t)x(t) = k(t). (1.34) 


These equations are solved by using the Integrating Factor Method. The 
integrating factor is the function e, where G(t) is chosen by G’(t) = g(t). 
After multiplying both sides of the equation by e°™), integrating, and solving 
for x(t), we have 


t 
x(t) = ceo f (e(s)) ds + x(0)ef® -C0 (1.35) 
0 


The integrating factor G(t) is determined up to a constant, but it is clear from 
(1.35), that the solution z(t) remains the same. 


Further Results on Functions and Integration 


Results given here are not required to understand subsequent material. Some 
of these involve the concepts of a set of zero Lebesgue measure. This is given 
in the next chapter (see Section 2.2); any countable set has Lebesgue measure 
zero, but there are also uncountable sets of zero Lebesgue measure. A partial 
converse to Theorem 1.8 also holds (see, for example, Saks (1964), Freedman 
(1983) p.209, for the following results). 


Theorem 1.21 (Lebesgue) A finite variation function g on [a,b] is differ- 
entiable almost everywhere on [a,b]. 


In what follows sufficient conditions for a function to be Lipschitz and not to 
be Lipschitz are given. 


1. If f is continuously differentiable on a finite interval [a,b], then it is 
Lipschitz. Indeed, since f’ is continuous on [a, b], it is bounded, | f’| < K. 
Therefore 


y y 
IF@)- FMI = | i, fi (t)dt| < I IF @ldt < K|æ — yl. (1.36) 
2. If f is continuous and piecewise smooth then it is Lipschitz, the proof is 


similar to the above. 


3. A Lipschitz function does not have to be differentiable, for example 
f(x) = |x| is Lipschitz but it is not differentiable at zero. 
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4. It follows from the definition of a Lipschitz function (1.28), that if it is 
differentiable, then its derivative is bounded by K. 


5. A Lipschitz function has finite variation on finite intervals, since for any 
partition {;} of a finite interval [a,b], 


XC lfm) — F(a) < KY (ti — t) = K(b- a). (1.37) 


6. As functions of finite variation have derivatives almost everywhere (with 
respect to Lebesgue measure), a Lipschitz function is differentiable al- 
most everywhere. 


(Note that functions of finite variation have derivatives which are inte- 
grable with respect to Lebesgue measure, but the function does not have 
to be equal to the integral of the derivative.) 


7. A Lipschitz function multiplied by a constant, and a sum of two Lipschitz 
functions are Lipschitz functions. The product of two bounded Lipschitz 
functions is again a Lipschitz function. 


8. If f is Lipschitz on [0, N] for any N > 0 but with the constant K depend- 
ing on N, then it is called locally Lipschitz. For example, x? is Lipschitz 
on [0, N] for any finite N, but it is not Lipschitz on [0, +00), since its 
derivative is unbounded. 


9. If f is a function of two variables f(x,t) and it satisfies Lipschitz condi- 
tion in x for all t, 0 < t < T, with same constant K independent of t, it 
is said that f satisfies Lipschitz condition in x uniformly in t,0 <t<T. 


A necessary and sufficient condition for a function f to be Riemann integrable 
was given by Lebesgue (see, for example, Saks (1964), Freedman (1983) p.208). 


Theorem 1.22 (Lebesgue) A necessary and sufficient condition for a func- 
tion f to be Riemann integrable on a finite closed interval [a,b] is that f is 
bounded on [a,b] and almost everywhere continuous on [a,b], that is, continu- 
ous at all points except possibly on a set of Lebesgue measure zero. 


Remark 1.6: (This is not used anywhere in the book, and directed only to 
readers with knowledge of Functional Analysis) 

Continuous functions on [a, b] with the supremum norm ||h|| = sup,¢ja,pj |2(x)| 
is a Banach space, denoted C([a, b]). By a result in Functional Analysis, any 
linear functional on this space can be represented as Sia. h(a)dg(x) for some 
function g of finite variation. In this way, the Banach space of functions of 
finite variation on [a,b] with the norm ||g|| = V,({a, b]) can be identified with 
the space of linear functionals on the space of continuous functions, in other 
words, the dual space of C((a, b]). 


Chapter 2 


Concepts of Probability 
Theory 


In this chapter we give fundamental definitions of probabilistic concepts. Since 
the theory is more transparent in the discrete case, it is presented first. The 
most important concepts not met in elementary courses are the models for in- 
formation, its flow and conditional expectation. This is only a brief description, 
and a more detailed treatment can be found in many books on Probability The- 
ory, for example, Breiman (1968), Loeve (1978), Durret (1991). Conditional 
expectation with its properties is central for further development, but some of 
the material in this chapter may be treated as an appendix. 


2.1 Discrete Probability Model 


A probability model consists of a filtered probability space on which variables 
of interest are defined. In this section we introduce a discrete probability 
model by using an example of discrete trading in stock. 


Filtered Probability Space 


A filtered probability space consists of: a sample space of elementary events, a 
field of events, a probability defined on that field, and a filtration of increasing 
subfields. 
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Sample Space 


Consider a single stock with price S, at time t = 1,2,...T. Denote by Q the 
set of all possible values of stock during these times. 


Q={w: w= (S1,So,...,S7)}. 


If we assume that the stock price can go up by a factor u and down by 
a factor d, then the relevant information reduces to the knowledge of the 
movements at each time. 


Q={w: w=(a1,d2,...,ar)} a=uord. 


To model uncertainty about the price in the future, we “list” all possible 
future prices, and call it possible states of the world. The unknown future is 
just one of many possible outcomes, called the true state of the world. As 
time passes more and more information is revealed about the true state of the 
world. At time t = 1 we know prices Sp and S1. Thus the true state of the 
world lies in a smaller set, subset of Q, A C Q. After observing Sı we know 
which prices did not happen at time 1. Therefore we know that the true state 
of the world is in A and not inQ\ A= A. 


Fields of Events 


Define by F, the information available to investors at time t, which consists 
of stock prices before and at time t. 

For example when T = 2, at t = 0 we have no information about Sı and 
S2, and Fo = {9,0}, all we know is that a true state of the world is in Q. 
Consider the situation at t = 1. Suppose at t = 1 stock went up by u. Then 
we know that the true state of the world is in A, and not in its complement 
A, where 

A= {(u, S2), S2 = u or d} = {(u, u), (u,d)}. 


Thus our information at time t = 1 is 
Fı = {0,0, A, A}. 


Note that Fo C Fy, since we don’t forget the previous information. 

At time t investors know which part of Q contains the true state of the 
world. F; is called a field or algebra of sets. 
F is a field if 


1. 0 QEF 
2. If AG F, and BE F then AUBEF, ANBEF, A\ BEF. 
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Example 2.1: (Examples of fields.) 

It is easy to verify that any of the following is a field of sets. 

1. {Q, Ø} is called the trivial field Fo. 

2. {0,Q, A, A} is called the field generated by set A, and denoted by Fa. 
3. {A: A C Q} the field of all the subsets of Q. It is denoted by 2°. 


A partition of Q is a collection of exhaustive and mutually exclusive subsets, 


{D1,...,Dk}, such that D:N D; =0, and |] D: = ©. 


The field generated by the partition is the collection of all finite unions of D;’s 
and their complements. These sets are like the basic building blocks for the 
field. If Q is finite then any field is generated by a partition. 

If one field is included in the other, Fı C Fo, then any set from F; is also 
in Fə. In other words, a set from F; is either a set or a union of sets from the 
partition generating F2. This means that the partition that generates Fo has 
“finer” sets than the ones that generate F1. 


Filtration 


A filtration F is the collection of fields, 
F = {Fo0,Fi,...,F1,...,Fr} Fe Feie 


F is used to model a flow of information. As time passes, an observer knows 
more and more detailed information, that is, finer and finer partitions of Q. 
In the example of the price of stock, F describes how the information about 
prices is revealed to investors. 


Example 2.2: F = {Fo, Fa, 2°}, is an example of filtration. 


Stochastic Processes 


If Q is a finite sample space, then a function X defined on Q attaches numerical 
values to each w € Q. Since Q is finite, X takes only finitely many values x;, 
RS eek: 

If a field of events F is specified, then any set in it is called a measurable 
set. If F = 2, then any subset of Q is measurable. 

A function X on Q is called F-measurable or a random variable on (Q, F) 
if all the sets {X = x;},i =1,...,k, are members of F. This means that if we 
have the information described by F, that is, we know which event in F has 
occurred, then we know which value of X has occurred. Note that if F = 2®, 
then any function on Q is a random variable. 
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Example 2.3: Consider the model for trading in stock, t = 1,2, where at each 
time the stock can go up by the factor u or down by the factor d. 

Q = {wi = (u,u),w2 = (u,d),w3 = (d,u),wa = (d,d)}. Take A = {w1, w2}, which is 
the event that at t = 1 the stock goes up. Fi = {0,, A, A}, and Fo = 2° contains 
all 16 subsets of Q. Consider the following functions on Q. X(w1) = X(w2) = 1.5, 
X(w3) = X(w4) = 0.5. X is a random variable on Fı. Indeed, the set {w : X(w) = 
1.5} = {wi,we} = AE Fi. Also {w : X(w) = 0.5} = A € A. If Y(wi) = (1.5), 
Y (we) = 0.75, Y(w3) = 0.75, and Y(wa) = 0.5, then Y is not a random variable 
on F, it is not F;-measurable. Indeed, {w : Y(w) = 0.75} = {w2,w3} € Fi. Y is 
F2-measurable. 


Definition 2.1 A stochastic process is a collection of random variables {X (t)}. 
For any fixed t, t=0,1,...,T, X(t) is a random variable on (Q, Fr). 

A stochastic process is called adapted to filtration F if for allt = 0,1,...,T, 
X(t) is a random variable on F,, that is, if X(t) is F,-measurable. 


Example 2.4: (Example 2.3 continued.) 

Xı = X, X2 = Y is a stochastic process adapted to F = {F1, F2}. This process 
represents stock prices at time t under the assumption that the stock can appreciate 
or depreciate by 50% in a unit of time. 


Field Generated by a Random Variable 


Let (Q,2®%) be a sample space with the field of all events, and X be a random 
variable with values x;, i = 1,2,...k. Consider sets 


Ai ={w: X(w)=2;} CQ. 


These sets form a partition of Q, and the field generated by this partition is 
called the field generated by X. It is the smallest field that contains all the 
sets of the form A; = {X = 2;} and it is denoted by Fx or o(X). The field 
generated by X represents the information we can extract about the true state 
w by observing X. 


Example 2.5: (Example 2.3 continued.) 
fw: X(w) = 1.5} = {w1,w2} = A, {w: X(w) = 0.5} = {w3,wa} = A. 


Fx =F, = {0,9, A, A}. 


Filtration Generated by a Stochastic Process 


Given (Q, F) and a stochastic process {X(t)} let Fp = a({Xs, 0 < s < t}) 
be the field generated by random variables Xs, s = 0,...,¢. It is all the 
information available from the observation of the process up to time t. Clearly, 
Fi C Fii1, so that these fields form a filtration. This filtration is called the 
natural filtration of the process {X (t)}. 
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If A € F, then by observing the process from 0 to t we know at time t 
whether the true state of the world is in A or not. We illustrate this on our 
financial example. 


Example 2.6: Take T = 3, and assume that at each trading time the stock can go 
up by the factor u or down by d. 


u u u d u u 
B 
u u d d u d 
Vei di d d u 
B 
u d d d d d 
A A | 


Look at the sets generated by information about $1. This is a partition of Q, {A, A}. 
Together with the empty set and the whole set, this is the field Fı. Sets generated 
by information about S2 are B and B. Thus the sets formed by knowledge of Sı and 
S2 is the partition of Q, consisting of all intersections of the above sets. Together 
with the empty set and the whole set this is the field F2. Clearly any set in Fı is 
also in F2, for example A = (AN B)U(ANB). Similarly if we add information about 
S3 we obtain all the elementary sets, w’s and hence all subsets of Q, F3 = 2°, In 
particular we will know the true state of the world when T = 3. 

Fo C Fi C Fe C Fs is the filtration generated by the price process {S:, t = 1,2, 3}. 


Predictable Processes 


Suppose that a filtration F = (Fo,Fi,...,F:,..., Fr) is given. A process 
H, is called predictable (with respect to this filtration) if for each t, H, is 
F,-;-measurable, that is, the value of the process H at time t is determined 
by the information up to and including time t — 1. For example, the number 
of shares held at time t is determined on the basis of information up to and 
including time t — 1. Thus this process is predictable with respect to the 
filtration generated by the stock prices. 


Stopping Times 


T is called a random time if it is a non-negative random variable, which can also 
take value oo on (Q, Fr). Suppose that a filtration F = (Fo,F1,...,F1,...,Fr) 
is given. 7 is called a stopping time with respect to this filtration if for each 
t=0,1,...,T7 the event 

{7 <theF,. (2.1) 
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This means that by observing the information contained in F, we can decide 
whether the event {7r < t} has or has not occurred. If the filtration F is 
generated by {S+}, then by observing the process up to time t, So, S1,..., St, 
we can decide whether the event {7 < t} has or has not occurred. 


Probability 


If Q is a finite sample space, then we can assign to each outcome w a probability, 
P(w), that is, the likelihood of it occurring. This assignment can be arbitrary. 
The only requirement is that P(w) > 0 and UP(w) = P(Q) = 1. 


Example 2.7: Take T = 2 in our basic example 2.3. If the stock goes up or down 
independently of its value and if, say, the probability to go up is 0.4 then 


Q= {(u, u); (u, d); (d, u) (d, d)} 
P(w) 0.16 0.24 0.24 0.36 


Distribution of a Random Variable 


Since a random variable X is a function from Q to R, and Q is finite, X can 
take only finitely many values, as the set X (Q) is finite. Denote these values 
by zi, i = 1,2,...k. The collection of probabilities p; of sets {X = x;} = {w: 
X(w) = x;} is called the probability distribution of X; for i = 1,2,...k 


pi = P(X = z;) = 5 P(w). 


w:X(w)=zi 


Expectation 


If X is a random variable on (Q, F) and P is a probability, then the expectation 
of X with respect to P is 


EpX = X X(w)P(w), 


where the sum is taken over all elementary outcomes w. It can be shown that 
the expectation can be calculated by using the probability distribution of X, 


k 
EpX = X x:P(X =2;). 
i=l 


Of course if the probability P is changed to another probability Q, then the 
same random variable X may have a different probability distribution q; = 
Q(X = qi), and a different expected value, EgX = y xiqi- When the 
probability P is fixed, or it is clear from the context with respect to which 
probability P the expectation is taken, then the reference to P is dropped 
from the notation, and the expectation is denoted simply by E(X) or EX. 
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Conditional Probabilities and Expectations 


Let (0,2°,P) be a finite probability space, and G be a field generated by a 
partition of Q, {D1,..., Dk}, such that D; O D; = 0, and U; Dj = Q. Recall 
that if D is an event of positive probability, P(D) > 0, then the conditional 
probability of an event A given the event D is defined by 


P(AN D) 


Suppose that all the sets D; in the partition have a positive probability. The 
conditional probability of the event A given the field G is the random variable 
that takes values P(A|D;) on D;, i = 1,...k. Let Ip denote the indicator of 
the set D, that is, Ip(w) = 1 if w € D and Ip(w) = 1 if w € D. Using this 
notation, the conditional probability can be written as 


k 
P(A|G)(w) = > PIAIDi)Ip, (w). (2.2) 


For example, if G = {0, Q} is the trivial field, then 
P(A|G) = P(A|Q) = P(A). 


Let now Y be a r.v. that takes values y1,..., Yk, then the sets D; = {w : 
Y(w) = yi}, i= 1,... k, forma partition of Q. If Fy denotes the field generated 
by Y, then the conditional probability given Fy is denoted by 


P(A|Fy) = P(AIY). 


It was assumed so far that all the sets in the partition have a positive prob- 
ability. If the partition contains a set of zero probability, call it N, then the 
conditional probability is not defined on N by the above formula (2.2). It can 
be defined for an w € N arbitrarily. Consequently any random variable which 
is defined by (2.2) and is defined arbitrarily on N is a version of the conditional 
probability. Any two versions only differ on N, which is a set of probability 
Zero. 


Conditional Expectation 


In this section let X take values z1,..., £p and Ay = {X = 2},...,Ap = 
{X = zp}. Let the field G be generated by a partition {D1, Do,.. Det of Q. 
Then the conditional expectation of X given G is defined by 


E(X|G) = Ya (Ai|G). 
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Note that E(X|G) is a linear combination of random variables, so that it is 
a random variable. It is clear that P(A|G) = E(Z4|G), and E(X|Fo) = EX, 
where Fo = {0, Q} is the trivial field. 

By the definition of measurability, X is G-measurable if and only if for 
any i, {X = xi} = A; is a member of G, which means that it is either one 
of the D,’s or a union of some of the D;’s. Since X(w) = S7?_, xila, (w), a 
G-measurable X can be written as X(w) = ey xjIp, (w), where some 2;’s 
may be equal. It is easy to see now that 


if X is G-measurable, then E(X|G) = X 


Note that since the conditional probabilities are defined up to a null set, so is 
the conditional expectation. 

If X and Y are random variables both taking a finite number of values, 
then E(X|Y) is defined as E(X|G), where G = Fy is the field generated by 
the random variable Y. In other words if X takes values x1,..., £p and Y 
takes values y1,..., Yk, and P(Y = y;) > 0 for alli =1,...k, then E(X|Y) is 
a random variable, which takes values Dai xjP(X = z;|Y = yi) on the set 
{Y = yi} i =1,...k. These values are denoted by E(X|Y = y;). It is clear 
from the definition that E(X|Y) is a function of Y, 


E(X|Y)(@) = E(X|Fy)(@ s5 $ aP X = 2|Y = yi) | y=y,3()- 


=I j=l 


2.2 Continuous Probability Model 


In this section we define similar probabilistic concepts for a continuous sample 
space. We start with general definitions. 


o-Fields 


A o-field is a field, which is closed with respect to countable unions and count- 
able intersections of its members, that is a collection of subsets of Q that 
satisfies 


1. 0,Q€F. 
2. ACFSAEF. 
3. A, Ao,...,An,... € F then UP, An € F (and then also NXZ] An € F). 


Any subset B of Q that belongs to F is called a measurable set. 
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Borel o-field 


Borel o-field is the most important example of a o-field that is used in the 
theory of functions, Lebesgue integration, and probability. Consider the o- 
field B on R (Q = R) generated by the intervals. It is obtained by taking 
all the intervals first and then all the sets obtained from the intervals by 
forming countable unions, countable intersections and their complements are 
included into collection, and countable unions and intersections of these sets 
are included, etc. It can be shown that we end up with the smallest o-field 
which contains all the intervals. A rigorous definition follows. One can show 
that the intersection of o-fields is again a o-field. Take the intersection of all o- 
fields containing the collection of intervals. It is the smallest o-field containing 
the intervals, the Borel o-field on R. In this model a measurable set is a set 
from $, a Borel set. 


Probability 
A probability P on (Q, F) is a set function on F, such that 


1. P(Q) =1, 


2. If A € F, then P(A) = 1 — P(A), 


3. Countable additivity (o-additivity): if A1, A2,..., AÅn,... € F are mu- 
tually exclusive, then P (UŞ; An) = X>; P(An). 


n=1 
The o-additivity property is equivalent to finite additivity plus the continuity 


property of probability, which states: if Ay D Ag D... D AnD... DA= 
NX An E F, then 


lim P(A,,) = P(A). 
noo 
A similar property holds for an increasing sequence of events. 

How can one define a probability on a o-field? It is not hard to see that 
it is impossible to assign probabilities to all individual w’s since there are too 
many of them and P({w}) = 0. On the other hand it is difficult to assign 
probabilities to sets in F directly, since in general we don’t even know what a 
set from F looks like. The standard way is to define the probability on a field 
which generates the o-field, and then extend it to the o-field. 


Theorem 2.2 (Caratheodory Extension Theorem) If a set function P 


is defined on a field F, satisfies P(Q) = 1, P(A) = 1 — P(A), and is countably 
additive, then there is a unique extension of P to the o-field generated by F. 
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Lebesgue Measure 


As an application of the above Theorem 2.2 we define the Lebesgue measure 
on [0,1]. Let Q = [0,1], and take for F the class of all finite unions of disjoint 
intervals contained in [0,1]. It is clearly a field. Define the probability P(A) 
on F by the length of A. It is not hard to show that P is o-additive on F. 
Thus there is a unique extension of P to B, the Borel o-field generated by F. 
This extension is the Lebesgue measure on B. It is also a probability on B, 
since the length of [0, 1] is one. 

Any point has Lebesgue measure zero. Indeed, {x£} = Nn(a — 1/n, a+ 
1/n). Therefore P({x}) = limp... 2/n = 0. By countable additivity it follows 
that any countable set has Lebesgue measure zero. In particular the set of 
rationals on [0,1] is of zero Lebesgue measure. The set of irrationals on [0,1] 
has Lebesgue measure 1. 

The term “almost everywhere” (for “almost all x”) means everywhere (for 
all x) except, perhaps, a set of Lebesgue measure zero. 


Random Variables 


A random variable X on (Q, F) is a measurable function from (Q, F) to ( R, B), 
where B is the Borel o-field on the line. This means that for any Borel set 
B € B the set {w : X(w) € B} is a member of F. Instead of verifying the 
definition for all Borel sets, it is enough to have that for all real x the set 
{w : X(w) < x} € F. In simple words, for a random variable we can assign 
probabilities to sets of the form {X < x}, and {a < X < b}. 


Example 2.8: Take Q = R with the Borel o-field B. By a measurable function on 
R is usually understood to be a B-measurable function, that is, a random variable 


x2 
on ( R, B). To define a probability on B, take f(x) = ye T and define P(A) = 
f ad (a)dx for any interval A. It is easy to show that P so defined is a probability on 
the algebra containing the intervals, and it is continuous at the Ø. Thus it extends 
uniquely to B. The function X (x) = x on this probability space is called the standard 


Normal random variable. 


An important question is how to describe measurable functions, and how to 
decide whether a given function is measurable. It is easy to see that indicators 
of measurable sets are measurable. (An indicator of a set A is defined as 
Ia(w) = 1 if and only if w € A.) Conversely, if an indicator of a set is 
measurable then the set is measurable, A € F. A simple function (simple 
random variable) is a finite linear combination of indicators of measurable sets. 
By definition, a simple function takes finitely many values and is measurable. 


Theorem 2.3 X is a random variable on (Q, F) if and only if it is a simple 
function or a limit of simple functions. 
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The sufficiency part follows from the example below. The necessity is not hard 
to prove, by establishing that the supremum and infimum of random variables 
is a random variable. 


Example 2.9: The following sequence of simple functions approximates a random 


variable X. 

n2” —1 k 

Xnw)= So Sr Tig tty (X(w))- 

k=—n2” 
These variables are constructed by taking the interval [—n,n] and dividing it into 
n2"*+ equal parts. Xn is zero on the set where X > n or X < —n. On the set where 
the values of X belong to the interval 5, EHL), X is replaced by E its smallest 
value on that set. Note that all the sets {w : £ < X(w) < Eti) are measurable, 
by definition of a random variable. It is easy to see that the X,,’s are increasing, 
therefore converge to a limit, and for all w, X(w) = limnoo Xn (w). 

This example is due to Lebesgue, who gave it for non-negative functions, demon- 

strating that a measurable function X is a limit of a monotone sequence of simple 


functions Xn, Xn+1 > Xn. 


The next result states that a limit of random variables and a composition of 
random variables is again a random variable. 


Theorem 2.4 


1. If Xn are random variables on (Q, F) and X(w) = limyn +o Xn(w), then 
X is also a random variable. 


2. If X is a random variable on (Q,F) and g is a B-measurable function, 
then g(X) is also a random variable. 


Remark 2.1: In the above theorem the requirement that the limit X(w) = 
limn—oo Xn(w) exists for all w can be replaced by its existence for all w outside 
a set of probability zero, and on a subsequence. Such a limit is in probability, 
it is introduced later. 
o-field Generated by a Random Variable 
The o-field generated by a random variable X is the smallest o-field containing 
sets of the form {w: a < X(w) < b}, for any a,b E R. 
Distribution of a Random Variable 
The probability distribution function of X is defined as 

F(x) = Fx(x)= P(X <2). 


It follows from the properties of probability that F is non-decreasing and 
right-continuous. Due to monotonicity, it is of finite variation. 
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Joint Distribution 


If X and Y are random variables defined on the same space, then their joint 
distribution function is defined as 


F(a,y) =P(X <2,Y < y), 


for any choice of real numbers x and y. 

The distribution of X is recovered from the joint distribution of X and Y 
by Fx (a) = F(a, co), and similarly the distribution of Y is given by Fox, y), 
they are called the marginal distributions. 

The joint distribution of n random variables X1, X2,..., Xn is defined by 


P(X, < z1, X2 S T2,- .., Xn < Tn). 


The collection of random variables X = (X1, X2,..., Xn) is referred to as 
a random vector X, and the joint distribution as a multivariate probability 
distribution of X. One can consider X as R”-valued random variable, and it 
is possible to prove that X is an R”-valued random variable if and only if all 
of its components are random variables. 


Transformation of Densities 
A random vector X has a density f(a) = f(x£1, £2,..., £n) if for any set B (a 
Borel subset of R”), 
P(X € B)= i, f(a)daydrg...dxy. (2.3) 
LEB 


If X is transformed into Y by a transformation y = y(x), i.e. 


Yı = YılTı, T2,..., Ln) 
y= Y2(T1, L2,..., En) 
Yn = Yal Viiv Gs Gy En) 


then, provided this transformation is one-to-one, and the inverse transforma- 
tion has a non-vanishing Jacobian 


| Ox, On, Ox, 

ð la] OYn 

gee foe ae 

J=det} 1 22 ` Om |, 
£n £n Oty, 
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Y has a density given by 


fy(y) = F(er(y), e2(y),---,en(y))IF(Y)I- (2.4) 


This is easily established by using the change of variables in multiple integrals 
(see Example 2.15 for calculation of the bivariate density). 


2.3 Expectation and Lebesgue Integral 


Let X be a random variable on (Q, F), and P be a probability on F. Recall 
that in the discrete case the expectation of X is defined as $, X(w)P(w). The 
expectation in the continuous model is defined by the means of an integral 


EX = | X(w)dP(w). 
Q 

The expectation is defined for positive random variables first. The general case 
is obtained by using the decomposition X = X+—X~, where Xt = max(X,0) 
and X- = max(—X,0), and letting EX =EX* — EX~—, provided both EXT 
and EX- are finite. 

If X > 0 is a random variable, then it can be approximated by simple 
random variables (see Theorem 2.3 and Example 2.9). The expectation of a 
simple random variable is defined as a sum, that is if 


X=) egla,, then EX =X P(A). 
k=1 k=1 


Note that for a simple random variable, X > 0 implies EX > 0. This in turn 
implies that if X > Y, where X and Y are simple random variables, then 
EX > EY. 

Any positive random variable X can be approximated by an increasing 
sequence Xn of simple random variables, such approximation is given in Ex- 
ample 2.9. It now follows that since Xn is an increasing sequence, EXn is also 
an increasing sequence, hence has a limit. The limit of EX, is taken to be 
EX. It can be shown that this limit does not depend on the choice of the 
approximating sequence of simple random variables, so that the expectation 
is defined unambiguously. 


Definition 2.5 A random variable X is called integrable if both EX* and 
EX- are finite. In this case EX = EX+t —EX~—. 


Note that for X to be integrable both EX* and EX- must be finite, which is 
the same as E|X| = EX* + EX- < œ. 
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Lebesgue-Stieltjes Integral 


A distribution function on R is a non-decreasing right-continuous function 
which approaches 0 at —co and 1 at +oo. Such a distribution function, say 
F, defines uniquely a probability on the Borel o-field B by setting P((a, 6]) = 
F(b) — F(a). 

Take now (Q, F) = ( R, B) and a probability on B given by a distribution 
function F(x). A random variable on this space is a measurable function f(x). 
Its expected value is written as fp f(a)F (dx) and is called the Lebesgue- 
Stieltjes integral of f with respect to F. 

The distribution function F can be replaced by any function of finite vari- 
ation, giving rise to the general Lebesgue-Stieltjes integral. 

The probability distribution of a random variable X on (Q, F) is the prob- 
ability on B carried from F by X: for any B E€ B, 


Px(B) =P(X € B). (2.5) 


The distribution function is related to this probability by F(x) = Px ((—co, a]). 
Equation (2.5) gives the relation between the expectations of indicator func- 


tions, 
co 


f I(X(w) € B)dP(w) = J I(x € B)Px(dz). 
Q 


—Cco 


This can be extended from indicators to measurable functions, using an ap- 
proximation by simple functions, and we have the following result. 


Theorem 2.6 If X is a random variable with distribution function F(x), and 
h is a measurable function on R, such that h(X) is integrable, then 


EA(X) := f h(X(w))dP(w) = T h(x)P x (dx) := E h(x)F(dx). (2.6) 


—0O —0O 


Lebesgue Integral on the Line 


The Lebesgue-Stieltjes integral with respect to F(x) = x is known as the 
Lebesgue integral. 


Example 2.10: Let Q = [0,1], its elements w are real numbers zx, and take for 
probability the Lebesgue measure. Take X(w) = X(x) = x”. Then EX = je xdr = 
1/3. Construct an approximating sequence of simple functions and verify the value 
of the above integral. 

Similarly, for any continuous function f(x) on [0,1], X(w) = X(x) = f(x) isa 
random variable (using that a continuous function is measurable) with expectation 
EX = Ef = f; f(x)dx. 
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Theorem 2.7 If f is Riemann integrable on [a,b], then it is Lebesgue inte- 
grable on [a,b] and the integrals coincide. 


On the other hand there are functions which are Lebesgue integrable but not 
Riemann integrable. Recall that for a function to be Riemann integrable, it 
must be continuous at all points except for a set of Lebesgue measure zero. 
Some everywhere discontinuous functions are Lebesgue integrable. 


Example 2.11: Q = [0,1], and probability is given by the Lebesgue measure. 
Take X(x) = Ig(x) be the indicator function of the set Q of all rationals. Q has 
Lebesgue measure zero. As the expectation of an indicator is the probability of its 
set, EX = f. i Ig(x)dx = 0. However, Ig(x) is discontinuous at every point, so that 
the set of discontinuities of Ig(a) is [0,1] which has Lebesgue measure 1, therefore 
Ig(x) is not Riemann integrable. 


The next result is the fundamental theorem for the Lebesgue integral on the 
line. 


Theorem 2.8 If f is Lebesgue integrable on [a,b] then the derivative of the 
integral exists for almost all x € (a,b), and 


a f "FORT: (2.7) 


Properties of Expectation (Lebesgue Integral) 


It is not hard to show that the expectation (Lebesgue Integral) satisfies the 
following properties 


1. Linearity. If X and Y are integrable and a and ĝ are constants, then 
E(aX + BY) = aEX + PEY. 


2. If random variables X and Y satisfy, |X| < Y and Y is integrable, then 
X is also integrable and E| X| < EY. 


3. If a random variable X > 0 then EX = 0 if and only if P(X = 0) = 1. 


Jumps and Probability Densities 


The jump of F at x gives the probability P(X = x), F(x)—-F(x—) = P(X =2). 
Since F is right-continuous it has at most countably many jumps. 


Definition 2.9 F is called discrete if it changes only by jumps. 
If F(x) is continuous at x then P(X = x) = 0. 


Definition 2.10 F(x) is called absolutely continuous if there is a function 
f(x) > 0, such that for all x, F(x) is given by the Lebesgue integral F(x) = 
Jo, f(t)dt. In this case F'(x) = f(x) for almost all x (Theorem 2.8). 


36 CHAPTER 2. CONCEPTS OF PROBABILITY THEORY 


f is called the probability density function of X. It follows from the definition 
that for any a < b 


Pia<X <bdb)= [ seu 


There are plenty of examples in any introductory book on probability or 
statistics of continuous random variables with densities: Normal, Exponential, 
Uniform, Gamma, Cauchy, etc. 

The random variables X and Y with the joint distribution F(x,y) possess 
a density f(x,y) (with respect to the Lebesgue measure) if for any x,y 


F(a,y) = / . i k f(u, v)dudv, 


and then for almost all (with respect to the Lebesgue measure on the plane) 
T, Y, 
OF 
f(z,y) = aray (oY: 


A density for an n-dimensional random vector is defined similarly. 


Decomposition of Distributions and FV Functions 


Any distribution function can be decomposed into a continuous part and a 
jump part. Continuous distribution functions can be decomposed further 
(Lebesgue decomposition). 

If F is a continuous distribution function, then it can be decomposed into 
sum of two continuous distribution functions, the absolutely continuous part 
and the singular part, i.e. for some 0 <a<1 


F = aFy. + (1— a) Fang. (2.8) 


Fac is characterized by its density that exists at almost all points. For the 
singular part, FY,,.(x) exists for almost all x and is zero. An example of such 
function is the Cantor function, see Example 2.13, where the distribution func- 
tion is a constant between the points of the Cantor set. In most applications in 
statistics continuous distributions are absolutely continuous with zero singular 


part. 

Example 2.12: (Cantor set.) Consider the set {x : x = X; an/3",an € {0, 2}}. 
It is possible to show that this set does not have isolated points (a perfect set), that 
is, any point of the set is a limit of other points. Indeed, for a given sequence of 
Qn’s that contains infinitely many 2s, consider a new sequence which is the same up 
to the m — th term with all the rest being zeros. The distance between these two 


numbers is given by Daam 41 On /3” < 37™, which can be made arbitrarily small as 
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m increases. For a number with finitely many 2s, say k 2s, the numbers with the 
first k same places, and the rest zeros except the m-th place which is 2, approximate 
it. Indeed, the distance between these two numbers is 2/3”. It is also not hard to 
see that this set is uncountable (by the diagonal argument) and that it has Lebesgue 
measure zero. Although the Cantor set seems to be artificially constructed, Cantor 
type sets arise naturally in study of Brownian motion; for example, the set of zeros 
of Brownian motion is a Cantor type set. 


Example 2.13: (Cantor distribution.) 

The distribution function F of the random variable X = aes 1 On /3”, where an 
are independent identically distributed random variables taking values 0 and 2 with 
probability 1/2, is continuous and its derivative is zero almost everywhere. 


It takes a rather pathological example to construct a continuous singular distribution 
in one dimension. In dimension two such examples can be simple. 


Example 2.14: (Continuous singular distributions on the plane.) 


Take F such that oe = 0 almost everywhere on the plane. If F is a linear function 
in x and y, or a distribution function that does not depend on one of the variables x 
or y, then it is singular. For example, 0 < X,Y < 1 such that their joint distribution 
is determined by F(x,y) = $(x + y), for x,y satisfying 0 < x,y < 1. In this case 
only sets that have non-empty intersection with the axis have positive probability. 


Functions of finite variations have a similar structure to distribution func- 
tions. They can be decomposed into a continuous part and a jump part, and 
the continuous part can be decomposed further into an absolutely continuous 
part and a singular part. 


2.4 Transforms and Convergence 


If X* is integrable, E|X|* < oo, then the k-th moment of X is defined as 
E(X*). The moment generating function of X is defined as 


m(t) = E(e), 


provided e’* is integrable, for t in a neighbourhood of 0. 
Using the series expansion for the exponential function 


co 
ge Z, 
— n! 
we can formally write, by interchanging summation and the expectation, 


Co 


m(t) = Ee’* = DD = È Tex (2.9) 


n=0 
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Thus E(X”) can be obtained from the power series expansion of the moment 
generating function. 
The characteristic function of X is defined as 


b(t) = E(e’*) = E(cos(tX)) + iE(sin(tX)), 


where i = /—I. The characteristic function determines the distribution 
uniquely, so does the moment generating function when it exists on an in- 
terval containing 0. The advantage of the characteristic function over the 
moment generating function is that it exists for any random variable X, since 
the functions cos(tx) and sin(ta) are bounded for any t on the whole line, 
whereas e” is unbounded and the moment generating function need not exist. 
Existence of the moment generating function around zero implies existence 
of all the moments. If X does not have all the moments, then its moment 
generating function does not exist. 


Convergence of Random Variables 


There are four main concepts of convergence of a sequence of random variables. 
We give the definitions of progressively stronger concepts and some results on 
their relations. 


Definition 2.11 {X,,} converge in distribution to X, if their distribution 
functions Fa(x) converge to the distribution function F(x) at any point of 
continuity of F. 


It can be shown that {Xn} converge in distribution to X if and only if their 
characteristic functions (or moment generating functions) converge to that of 
X. Convergence in distribution is also equivalent to the requirement that 
Eg(Xn) > Eg(X) as n > oo for all bounded continuous functions g on R. 


Definition 2.12 {X,,} converge in probability to X if for any e >0 
P(|Xn — X| > €) > 0 as n > ow. 


Definition 2.13 {X,,} converge almost surely (a.s.) to X if for any w outside 
a set of zero probability Xn(w) + X (w) as n > oo. 


Almost sure convergence implies convergence in probability, which in turn 
implies convergence in distribution. It is also not hard to see that convergence 
in distribution to a constant is the same as the convergence in probability 
to the same constant. Convergence in probability implies the almost sure 
convergence on a subsequence, namely, if {Xn} converge in probability to X 
then there is a subsequence nk that converges almost surely to the same limit. 

L"”-convergence (convergence in the r-th mean), r > 1, is defined as follows. 
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Definition 2.14 {Xn} converge to X in L” if for any n E(|X,|") < œ, and 
E(|Xn — X|") > 0 as n > œ. 


Using the concept of uniform integrability, given later in Chapter 7, conver- 
gence in L” is equivalent to convergence in probability and uniform integrabil- 
ity of |X,|" (see for example, Loeve (1978) p.164). 

The following result, which is known as Slutskii theorem, is frequently used 
in applications. 


Theorem 2.15 If X, converges to X and Yn converges to Y, then Xn + 
Yn converges to X + Y, for any type of stochastic convergence, except for 
convergence in distribution. However, if Y =0 or Xn and Yn are independent, 
then the result is also true for convergence in distribution. 


Convergence of Expectations 


Theorem 2.16 (Monotone convergence) If Xn > 0, and Xn are increas- 
ing to a limit X, which may be infinite, then limn—+o EXn = EX. 


Theorem 2.17 (Fatou’s lemma) If Xn > 0 (or Xn > c > —oo), then 
E(liminf,, Xn) < liminf, EXn. 


Theorem 2.18 (Dominated Convergence) If limp... Xn = X in proba- 
bility and for all n |Xn| < Y with EY < œ, then limp. EX, = EX. 


2.5 Independence and Covariance 


Independence 


Two events A and B are called independent if P(A N B) = P(A)P(B). 
A collection of events A;, i = 1,2, ... is called independent if for any finite 
n and any choice of indices iz, k =1,2,...n 


e (Ña) -irran 
k=l k=1 


Two o-fields are called independent if for any choice of sets from each of them, 
these sets are independent. 

Two random variables X and Y are independent if the o-fields they gen- 
erate are independent. It follows that their joint distribution is given by the 
product of their marginal distributions (since the sets {X < x} and {Y < y} 
are in the respective o-fields) 


P(X <2,Y <y)=P(X < x)P(Y < y), 
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and can be seen that it is an equivalent property. 
One can formulate the independence property in terms of expectations by 
writing above in terms of indicators 


B(I(X <a)1(¥ <y)) = E(X <2)) (MY < v). 


Since it is possible to approximate indicators by continuous bounded functions, 
X and Y are independent if and only if for any bounded continuous functions 
f and g, 

E(f(X)g(¥)) = E (X)EU(Y)). 

X1,X2,...,Xn are called independent if for any choice of random vari- 
ables X;,,Xi,,...X;, their joint distribution is given by the product of their 
marginal distributions (alternatively, if the o-fields they generate are indepen- 
dent). 


Covariance 


The covariance of two integrable random variables X and Y is defined, pro- 
vided XY is integrable, by 


Cov(X,Y) = E(X — EX) (Y — EY) = E (XY) — EXEY. (2.10) 


The variance of X is the covariance of X with itself, Var(X) = Cov(X, X). 
The Cauchy-Schwarz inequality 


(E|XY|)? < E(XÐE(Y?), (2.11) 


assures that covariance exists for square integrable random variables. Covari- 
ance is symmetric, 
Cov(X, Y) = Cov(Y, X), 


and is linear in both variables (bilinear) 
Cov(aX + bY, Z) = aCov(X, Z) + bCov(Y, Z). 


Using this property with X + Y we obtain the formula for the variance of the 
sum. The following property of the covariance holds 


Var(X +Y) = Var(X) + Var(Y) + 2Cov(X,Y). (2.12) 


Random variables X and Y are called uncorrelated if Cov(X,Y) = 0. It is 
easy to see from the definitions that for independent random variables 


E(XY) = EXEY, 


which implies that they are uncorrelated. The opposite implication is not true 
in general. The important exception is the Gaussian case. 
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Theorem 2.19 If the random variables have a joint Gaussian distribution, 
then they are independent if and only if they are uncorrelated. 


Definition 2.20 The covariance matrix of a random vector 
X = (X1, X2,..., Xn) is the n x n matriz with the elements Cov(X;, X;). 


2.6 Normal (Gaussian) Distributions 


The Normal (Gaussian) probability density is given by 
1 _ (e=n)? 
a ee 


2 
£; u, 0°) = —— e z 
Tehe Ten 
It is completely specified by its mean u and its standard deviation ø. The 
Normal family N (pu, 0°) is obtained from the Standard Normal Distribution, 
N (0,1) by a linear transformation. 


N = 
If X is N(u,02) then Z = ——* 


is N(0,1) and X = u+ oZ. 


An important property of the Normal family is that a linear combina- 
tion of independent Normal variables results in a Normal variable, that is, if 
Xı ~ N(m,0?) and X2 ~ N(u2,02) are independent then aX, + BX2 ~ 
N(apı + bu2,a?0? + 6203). The moment generating function of X with 
N(,07) distribution is given by 


m(t) = Ee'* = ia ef f(x; p, o> Jde = ettel? /2 = eutt(ot)?/2, 
A random vector X = (X1, X2,..., Xn) has an n-variate Normal (Gaussian) 


distribution with mean vector u and covariance matrix & if there exist an nxn 
matrix A such that its determinant |A| # 0, and X = w+ AZ, where Z = 
(Zi, Z2,..., Zn) is the vector with independent standard Normal components, 
and = = AAT. Vectors are taken as column vectors here and in the sequel. 

The probability density of Z is obtained by using independence of its com- 
ponents, for independent random variables the densities multiply. Then per- 
forming a change of variables in the multiple integral we find the probability 
density of X 


1 1 =1 LP 
= —3(@- ph) (a—p) 
fx (x) = (Qn)"/2 [5/1/72 aes a 


Example 2.15: Let a bivariate Normal have u = 0 and © = ; | . Let 


X = (X,Y) and z = (x,y). Then X can be obtained from Z by the transformation 
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1 0 
X = AZ with A= f 
f vial 


Si Me: p the i transformati 
ince , the inverse transformation 
y = putr/1— pz 
Zı = T 
has the Jacobian 
z2 = (y-pr)/ y1- p 
Ozi, ozi 1 0 1 
J=] & oy |-| 1 |--= 
T Vie lap? 


The density of Z is given by the product of standard Normal densities, by inde- 
pendence, f7(z1,22) = alg aiei tea), 
density of the bivariate Normal 


Benn- 
XEM I= p 


It follows from the definition that if X has a multivariate Normal distribution 
and a is a non-random vector then aX = a(u + AZ) = ap + aAZ. Since 
a linear combination of independent Normal random variables is a Normal 
random variable, aAZ is a Normal random variable. Hence aX has Normal 
distribution with mean ap and variance (aA)(aA)T = a¥aT. Thus we have 


Using the formula (2.4) we obtain the joint 


za pale- 2ery+y’] ; 


€ 


Theorem 2.21 A linear combination of jointly Gaussian random variables is 
a Gaussian random variable. 


Similarly it can be shown that if X ~ N (pu, ©) and B is a non-random matrix, 
then BX ~ N(Bpu, BEB"). 
The moment generating function of a vector X is defined as 


E(etX) = Bedi"), 


where t = (t1,t2,...,tn), and tX is the scalar product of vectors t and X. 
It is not hard to show that the moment generating function of a Gaussian 
vector X ~ N(u, ©) is given by 


Mx (t) = Ht -32t 


Definition 2.22 A collection of random variables is called a Gaussian pro- 
cess, if the joint distribution of any finite number of its members is Gaussian. 


Theorem 2.23 Let X(t) be a process with independent Gaussian increments, 
that is, for any s < t, X(t) — X(s) has a Normal distribution, and is indepen- 
dent of the values X(u),u < s (the o-field F, generated by the process up to 
time s). Then X(t) is a Gaussian process. 


See Section 3.1, Example 3.3 for the proof. 
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2.7 Conditional Expectation 


Conditional Expectation and Conditional Distribution 
The conditional distribution function of X given Y = y is defined by 


P(X <2,Y =y) 


P(X <al¥ =y)= r 


provided P(Y = y) > 0. However, such an approach fails if the event we 
condition on has zero probability, P(Y = y) = 0. This difficulty can be 
overcome if X,Y have a joint density f(x,y). In this case it follows that both 
X and Y possess densities fx(x), and fy(y); fx(x) = ys f(a, y)dy, and 
fy(y) = [& f(a, y)dz. The conditional distribution of X given Y = y is 
defined by the conditional density 


f(zly) = 


at any point where fy(y) > 0. It is easy to see that so defined f(a|y) is indeed 
a probability density for any y, as it is non-negative and integrates to unity. 

The expectation of this distribution, when it exists, is called the conditional 
expectation of X given Y = y, 


Co 


E(X|Y = y) = ik xf (aly)da. (2.13) 


—oco 


The conditional expectation E(X|Y = y) is a function of y. Let g denote 
this function, gly) = E(X|Y = y), then by replacing y by Y we obtain a 
random variable g(Y), which is the conditional expectation of X given Y, 
E(X|Y) = g(Y). 


Example 2.16: Let X and Y have a standard bivariate Normal distribution with 
parameter p. Then 
Lie w/ 2" so 


f(x,y) = sce xn -al - 2r +l}, and fy(y) = see 
that 
— fey) _ (z=)? 


f(zly) = iy = EOE exp {- eeu | , which is the N(py,1— p°) distribu- 
tion. Its mean is py, therefore E(X|Y = y) = py, and E(X|Y) = pY. 

Similarly, it can be seen that in the multivariate Normal case the conditional 
expectation is also a linear function of Y. 


The conditional distribution and the conditional expectation are defined only 
at the points where fy(y) > 0. Both can be defined arbitrarily on the set 
{y : fy(y) = 0}. Since there are many functions which agree on the set {y : 
fy(y) > 0}, any one of them is called a version of the conditional distribution 
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(the conditional expectation) of X given Y = y. The different versions of 
f(aly) and E(X|Y = y) differ only on the set {y : fy(y) = 0}, which has zero 
probability under the distribution of Y; f(x|y) and E(X|Y = y) are defined 
uniquely Y-almost surely. 


General Conditional Expectation 


The conditional expectation in a more general form is defined as follows. Let 
X be an integrable random variable. E(X|Y) = G(Y) a function of Y such 
that for any bounded function h, 


E(XA(Y)) = E(YA(Y)), (2.14) 


or B((x — GY))JAY)) = 0. Existence of such a function is assured by the 


Radon-Nikodym theorem from functional analysis. But uniqueness is easy to 
prove. If there are two such functions, G1, G2, then E((Gi(Y)—G2(Y))A(Y)) = 
0. Take h(y) = sign(Gi(y) — Go(y)). Then we have E|Gi(Y) — Go(Y)| = 0. 
Thus P(Gi(Y) = G2(Y )) = 1, and they coincide with (Y) probability one. 

A more general conditional expectation of X given a o-field G, E(X|G) is 
a G-measurable random variable such that for any bounded G-measurable € 


E(CE(X|G)) = E(éX). (2.15) 


In the literature, € = Ig is taken as indicator function of a set B € G, which 
is an equivalent condition: for any set B € G 


[ xe = | exa, or E(XI(B)) = E(E(X|9)I(B)). (2.16) 
B B 


The Radon-Nikodym theorem (see Theorem 10.6) implies that such a ran- 
dom variable exists and is almost surely unique, in the sense that any two 
versions differ only on a set of probability zero. 

The conditional expectation E(X|Y) is given by E(X|G) with G = o(Y), 
the o-field generated by Y. Often the Equations (2.15) or (2.16) are not used, 
because easier calculations are possible to various specific properties, but they 
are used to establish the fundamental properties given below. In particular, 
the conditional expectation defined in (2.13) by using densities satisfies (2.15) 
or (2.16). 


Properties of Conditional Expectation 


Conditional expectations are random variables. Their properties are stated 
as equalities of two random variables. Random variables X and Y, defined 
on the same space, are equal if P(X = Y) = 1. This is also written X = Y 
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almost surely (a.s.). If not stated otherwise, whenever the equality of random 
variables is used it is understood in the almost sure sense, and often writing 
“almost surely” is omitted. 


1. If G is the trivial field {0,0}, then 
E(X|G) = EX, (2.17) 


2. If X is G-measurable, then 
E(XY|G) = XE(Y |G). (2.18) 


This means that if G contains all the information about X, then given 
G, X is known, and therefore it is treated as a constant. 


3. If Gy C Ge then 
E(E(X|G2)|G1) = E(X|91). (2.19) 


This is known as the smoothing property of conditional expectation. In 
particular by taking G, to be the trivial field, we obtain the law of double 
expectation 

E(E(X|G)) = E(X). (2.20) 


4. If o(X) and G are independent, then 
E(X|G) = EX, (2.21) 


that is, if the information we know provides no clues about X, then the 
conditional expectation is the same as the expectation. The next result 
is an important generalization. 


5. If o(X) and G are independent, and F and G are independent, and 
o(F,G) denotes the smallest o-field containing both of them, then 


E(X|o(F,G)) = E(X|F). (2.22) 


6. Jensen’s inequality. If g(x) is a convex function on J, that is, for all 
z,y,€ I and A € (0,1) 


g(Aa + (L— A)y) < Ag(a) + (1 — A)g(y), 
and X is a random variable with range J, then 
g(E(X|9)) < E(g(X)I9). (2.23) 
In particular, with g(x) = |z| 


|E(X|G)| < E(|X||9). (2.24) 
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7. Monotone convergence. If 0 < Xn, and Xn | X with E|X| < oo, then 
E(X,|G) T E(X1g). (2.25) 
8. Fatous’ lemma. If 0 < Xn, then 
E(lim inf X,,|G) < lim inf E(X,,|9). (2.26) 


9. Dominated convergence. If lim,» Xn = X almost surely and |X,,| < Y 
with EY < œ, then 


jim E(Xn|9) = E(X|9). (2.27) 


For results on conditional expectations see e.g. Breiman (1968), Chapter 4. 
The conditional probability P(A|G) is defined as the conditional expectation 
of the indicator function, 


P(AIG) = EAG), 
and it is a G-measurable random variable, defined P-almost surely. 
The following results are often used. 


Theorem 2.24 Let X and Y be two independent random variables and $(2, y) 
be such that E|¢(X,Y)| < +œ. Then 


E(9(X,Y)|Y) = G(Y), 
where G(y) = E(o(X, y)). 
Theorem 2.25 Let (X,Y) be a Gaussian vector. Then the conditional dis- 


tribution of X given Y is also Gaussian. Moreover, provided the matrix 
Cov(Y,Y) is non-singular (has the inverse), 


E(X|Y) = E(X) + Cov(X, Y )Cov (Y, Y)(Y — E(Y)). 


In the case when Cov(Y,Y) is singular, the same formula holds with the 
inverse replaced by the generalized inverse, the Moore-Penrose pseudoinverse 
matrix. 

If one wants to predict/estimate X by using observations on Y, then a 
predictor is some function of Y. For a square-integrable X, E(X?) < œ, the 
best predictor x , by definition, minimizes the mean-square error. It is easy to 
show 


Theorem 2.26 (Best Estimator/Predictor) Let X be such that for any 
Y -measurable random variable Z, 


E(X — X}? < E(X — Z}?. 
Then X = E(X|Y). 
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2.8 Stochastic Processes in Continuous Time 


The construction of a mathematical model of uncertainty and of flow of in- 
formation in continuous time follows the same ideas as in the discrete time, 
but it is much more complicated. Consider constructing a probability model 
for a process S(t) when time changes continuously between 0 and T. Take for 
the sample space the set of all possibilities of movements of the process. If 
we make a simplifying assumption that the process changes continuously, we 
obtain the set of all continuous functions on [0,7], denoted by C[0, T]. This 
is a very rich space. In a more general model it is assumed that the observed 
process is a right-continuous function with left limits (regular right-continuous 
(RRC, cadlag)) function. 

Let the sample space Q = D[0, T] be the set of all RRC functions on [0, T]. 
An element of this set, w is a RRC function from [0,7] into R. First we 
must decide what kind of sets of these functions are measurable. The simplest 
sets for which we would like to calculate the probabilities are sets of the form 
{a < S(t1) < b} for some tı. If S(t) represents the price of a stock at time t, 
then the probability of such a set gives the probability that the stock price at 
time tı is between a and b. We are also interested in how the price of stock at 
time tı affects the price at another time t2. Thus we need to talk about the 
joint distribution of stock prices S(t,) and S(t2). This means that we need to 
define probability on the sets of the form {S(ti) € Bi, S(t2) € Bo} where Bı 
and Bə are intervals on the line. More generally we would like to have all finite- 
dimensional distributions of the process S(t), that is, probabilities of the sets: 
{S(ti) € Bi,...,S(tn) E Bn}, for any choice of 0 < tı < t2,... < tn < T. 
The sets of the form {w(-) € D[0,T] : w(ti) € Bi,...,w(tn) € Bn}, where 
B,’s are intervals on the line, are called cylinder sets or finite-dimensional 
rectangles. The stochastic process S(t) on this sample space is just s(t), the 
value of the function s at t. Probability is defined first on the cylinder sets, 
and then extended to the o-field F generated by the cylinders, that is, the 
smallest o-field containing all cylinder sets. One needs to be careful with 
consistency of probability defined on cylinder sets, so that when one cylinder 
contains another no contradiction of probability assignment is obtained. The 
result that shows that a consistent family of distributions defines a probability 
function, continuous at É on the field of cylinder sets is known as Kolmogorov’s 
extension theorem. Once a probability is defined the field of cylinder sets, it 
can be extended in a unique way (by Caratheodory’s theorem) to F (see for 
example, Breiman (1968), Durrett (1991) or Dudley (1989) for details). 

It follows immediately from this construction that: a) for any choice of 
0 < ti < t2,... < tn < T, S(t1), S(t2),---, S(tn) is a random vector, and b) 
that the process is determined by its finite-dimensional distributions. 
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Continuity and Regularity of Paths 


As discussed in the previous section, a stochastic process is determined by its 
finite-dimensional distributions. In studying stochastic processes it is often 
natural to think of them as random functions in t. Let S(t) be defined for 
0 < t < T, then for a fixed w it is a function in t, called the sample path 
or a realization of S. Finite-dimensional distributions do not determine the 
continuity property of sample paths. The following example illustrates this. 


Example 2.17: Let X(t) = 0 for all t, O < t < 1, and T bea uniformly distributed 
random variable on [0,1]. Let Y(t) = 0 fort A 7 and Y(t) = 1 if t = r. Then 
for any fixed t, P(Y (t) # 0) = P(r = t) = 0, hence P(Y (t) = 0) = 1. So that all 
one-dimensional distributions of X(t) and Y(t) are the same. Similarly all finite- 
dimensional distributions of X and Y are the same. However, the sample paths of 
the process X, that is, the functions X(t)o<:<1 are continuous in t, whereas every 
sample path Y(t)o<:<1 has a jump at the point +. Notice that P(X(t) = Y(t)) = 1 
for all t, O<t <1. 


Definition 2.27 Two stochastic processes are called versions (modifications) 
of one another if 


P(X(t) = Y(t)) =1 for all t,0<t<T. 


Thus the two processes in the Example 2.17 are versions of one another, one 
has continuous sample paths and the other does not. If we agree to pick any 
version of the process we want, then we can pick the continuous version when 
it exists. In general we choose the smoothest possible version of the process. 

For two processes X and Y, denote by N; = {X(t) F Y(t)}, O<t<T. In 
the above Example 2.17, P(.N;) = P(T = t) = 0 for any t, 0 < t < 1. However, 
P(Uper<1 Nt) = P(T = t for some t in [0,1]) = 1. Although, each of N; is a 
P-null set, the union N = Upe,e, Ne contains uncountably many null sets, 
and in this case it is a set of probability one. 

If it happens that P(N) = 0, then N is called an evanescent set, and the 
processes X and Y are called indistinguishable. Note that in this case 
P({w:3t: X(t) #¥(O} = PUperer (X(t) # Y(t)}) =0, and 
P((\<re1{X (t) = Y(O}) = P(X) = Y(t) for all t € [0,T]) = 1. It is clear 
that if the time is discrete then any two versions of the process are indistin- 
guishable. It is also not hard to see that if X(t) and Y (t) are versions of one 
another and they both are right-continuous, then they are indistinguishable. 

Conditions for the existence of the continuous and the regular (paths with 
only jump discontinuities) versions of a stochastic process are given below. 


Theorem 2.28 S(t), 0<t<T is R-valued stochastic process. 
1. If there exist a > 0 and € > 0, so that for anyO<u<t<T, 


E|S(t) — S(u)|* < C(t — u), (2.28) 
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for some constant C, then there exists a version of S with continuous 
sample paths, which are Holder continuous of order h < efa. 


2. If there exist C > 0, a1 > 0, a2 > 0 and e > 0, so that for any 
O<u<v<t<T, 


E(|5(v) — SWS) — S(v)["2) < Clt- u), (2.29) 


then there exists a version of S with paths that may have discontinuities 
of the first kind only (which means that at any interior point both right 
and left limits exist, and one-sided limits exist at the boundaries). 


Note that the above result allows to decide on the existence of the continu- 
ous (regular) version by means of the joint bivariate (trivariate) distributions 
of the process. The same result applies when the process takes values in R4, 
except that the Eucledean distance replaces the absolute value in the above 
conditions. 

Functions without discontinuities of the second kind are considered to be 
the same if at all points of the domain they have the same right and left limits. 
In this case it is possible to identify any such function with its right-continuous 
version. 

The following result gives a condition for the existence of a regular right- 
continuous version of a stochastic process. 


Theorem 2.29 If the stochastic process S(t) is right-continuous in probability 
(that is, for any t the limit in probability lim, S(u) = S(t)) and it does not 
have discontinuities of the second kind, then it has a right-continuous version. 


Other conditions for the regularity of path can be given if we know some 
particular properties of the process. For example, later we give such conditions 
for processes that are martingales and supermartingales. 


o-field Generated by a Stochastic Process 


F; = 0(Su,u < t) is the smallest o-field that contains sets of the form {a < 
Su <b} for 0< u< t,a,bE€ R. It is the information available to an observer 
of the process S up to time t. 


Filtered Probability Space and Adapted Processes 


A filtration F is a family {F;} of increasing o-fields on (Q, F), Fe C F. F 
specifies how the information is revealed in time. The property that a filtration 
is increasing corresponds to the fact the information is not forgotten. 
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If we have a set Q, a o-field of subsets of Q, F, a probability P defined on 
elements of F, and a filtration F such that 


Fock C...CFr=Ff, 


then (Q, F, F,P) is called a filtered probability space. 

A stochastic process on this space {S(t),0 < t < T} is called adapted if for 
all t, S(t) is F;-measurable, that is, if for any t, F, contains all the information 
about S(t) (and may contain extra information). 


The Usual Conditions 


Filtration is called right-continuous if Fi} = Fr, where 


Fee (Fe. 


s>t 


The standard assumption (referred to as the usual condition) is that filtration 
is right-continuous, for all t, F = F+. It has the following interpretation: 
any information known immediately after t is also known at t. 


Remark 2.2: Note that if S(t) is a process adapted to F, then we can 
always take a right-continuous filtration to which S(t) is adapted by taking 
Gt = Fis = os, Fs- Then S; is G adapted. 

The assumption of right-continuous filtration has a number of important 
consequences. For example, it allows to assume that martingales, submartin- 
gales and supermartingales have a regular right-continuous version. 


It is also assumed that any set which is a subset of a set of zero probability 
is Fo-measurable. Of course, such a set must have zero probability. A priori 
such sets need not be measurable, and we enlarge the o-fields to include such 
sets. This procedure is called the completion by the null sets. 


Martingales, Supermartingales, Submartingales 
Definition 2.30 A stochastic process {X(t),t > 0} adapted to a filtration F 
is a supermartingale (submartingale) if for any t it is integrable, E|X(t)| < oo, 
and for anys < t 

E(X(t)|Fs) < X(s), (E(X()|Fs) = X(s)). 
If E(X(t)|Fs) = X(s), then the process X(t) is called a martingale. 


An example of a martingale is given by the following 
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Theorem 2.31 (Doob-Levy martingale) Let Y be an integrable random 
variable, that is, E|Y | < oo, then 


M(t) = E(Y |F). (2.30) 
is a martingale. 
PROOF: By the law of double expectation 


E(M(t)|Fs) = E(E(Y|Ft)|Fs) = E(Y|Fs) = M(s). 


Using the law of double expectation, it is easy to see that the mean of a 
martingale is a constant in t, the mean of a supermartingale is non-increasing 
in t, and the mean of a submartingale is non-decreasing in t. 

If X(t) is a supermartingale, then —X (t) is a submartingale, directly from 
the definition. 

We have the following result for the existence of the right-continuous ver- 
sion for super or submartingales, without the assumption of continuity in prob- 
ability imposed on the process (see for example, Liptser and Shiryaev (1974), 
Vol I, p.55). 


Theorem 2.32 Let the filtration F be right-continuous and each of the o- 
fields Fy, be completed by the P-null sets from F. In order that the super- 
martingale X(t) has a right-continuous version, it is necessary and sufficient 
that its mean function EX (t) is right-continuous. In particular, any martin- 
gale with right-continuous filtration admits a regular right-continuous version. 


In view of these results, it will often be assumed that the version of the process 
under consideration is regular and right-continuous (cadlag). 


Stopping Times 
Definition 2.33 A non-negative random variable T, which is allowed to take 
the value oo, is called a stopping time (with respect to filtration F) if for each 


t, the event 
{r < t} E Fis 


It is immediate that for all t, the complimentary event {7 > t} E€ Fe. fr <t, 
then for some n, T < t — 1/n. Thus 


Co 


{r <t}= U{r<t-1/n}. 


n=1 


The event {Tr < t—1/n} € Fy_in. Since F; are increasing, {T < t—1/n} E€ Fi, 
therefore {7 < t} E€ Fy. 
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Introduce 


Fp = Ve = a( UA: 


s<t s<t 
The above argument shows that {r < t} € Fy_. 


Theorem 2.34 If filtration is right-continuous, then T is a stopping time if 
and only if for each t, the event {T < t} € F. 


PROOF: One direction has just been established, the other one is seen as 


follows. x 


{r<t}= (] {T <t+1/n}. 


n=1 


Since {7 < t+ 1/n} € Fi4imm, by right-continuity of F, {7 < t} € Fy. 


The assumption of right-continuity of F is important when studying exit 
times and hitting times of a set by a process. If S(t) is a random process on 
R adapted to F, then the hitting time of set A is defined as 


Ta = inf{t > 0: S(t) € A}. (2.31) 
The first exit time from a set D is defined as 

Tp =inf{t > 0: S(t) ¢ D}. (2.32) 
Note that Tp = TR\p 


Theorem 2.35 Let S(t) be continuous and adapted to F. If D = (a,b) is 
an open interval, or any other open set on R (a countable union of open 
intervals), then Tp is a stopping time. If A = [a,b] is a closed interval, or any 
other closed set on R. (its complement is an open set), then T4 is a stopping 
time. If in addition the filtration F is right-continuous then also for closed 
sets D and open sets A, Tp and T4 are stopping times. 


PROOF: {tp >t} = {S(u) € D, for all u < t} = Mo<u<t{S(u) € D}. This 
event is an uncountable intersection over all u < t of events in F,,. The point 
of the proof is to represent this event as a countable intersection. Due to 
continuity of S(u) and D being open, for any irrational u with S(u) € D there 
is a rational q with S(q) € D. Therefore 


MiswmeD}= N {eD}, 
OSust 0<q—-rational <t 


which is now a countable intersection of the events from F;, and hence is itself 
in F;. This shows that Tp is a stopping time. Since for any closed set A, R\A 
is open, and T4 = TRA’ Tx is also a stopping time. 
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Assume now that filtration is right-continuous. If D = [a,b] is a closed 
interval, then D = NX (a — 1/n,b + 1/n). If Dn = (a — 1/n,b + 1/n), then 
Tp,, is a stopping time, and the event {tp, > t} € F;. It is easy to see that 
NX {tp,, > t} = {Tp = t}, hence {Tp > t} € Fi, and also {tp < t} € F; as 
its complimentary, for any t. The rest of the proof follows by Theorem 2.34. 


For general processes the following result holds. 


Theorem 2.36 Let S(t) be regular right-continuous and adapted to F, and 
F be right-continuous. If A is an open set on R, then T4 is a stopping time. 
If A is a closed set, then inf{t > 0: S(t) € A, or S(t—) € A} is a stopping 
time. 
It is possible, although much harder, to show that the hitting time of a Borel 
set is a stopping time. 

Next results give basic properties of stopping times. 
Theorem 2.37 Let S andT be two stopping times, then min(S, T), max(S, T), 
S +T are all stopping times. 


o-field Fr 


If T is a stopping time, events observed before or at time T are described by 
o-field Fr, defined as the collection of sets 
Fr ={AeF: foranyt, AN{T <t} E Fi}. 


Theorem 2.38 Let S and T be two stopping times. The following properties 
hold: If A € Fs, then AN {S =T} € Fr, consequently {S = T} € Fs N Fr. 
If A€ Fs, then AN{S < T} € Fr, consequently {S < T} € Fs N Fr. 


Fubini’s Theorem 


Fubini’s theorem allows us to interchange integrals (sums) and expectations. 
We give a particular case of Fubini’s theorem, it is formulated in the way we 
use it in applications. 


Theorem 2.39 Let X(t) be a stochastic process O < t < T (for all t X(t) is 
a random variable), with regular sample paths (for all w at any point t, X (t) 
has left and right limits). Then 


T. T 
f zxoa=ef | xola). 


Furthermore if this quantity is finite, then 


T T 
e(/ xoa) -f E(X(t))dt. 
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Chapter 3 


Basic Stochastic Processes 


This chapter is mainly about Brownian motion. It is the main process in the 
calculus of continuous processes. The Poisson process is the main process in 
the calculus of processes with jumps. Both processes give rise to functions of 
positive quadratic variation. For Stochastic Calculus only Section 3.1-3.5 are 
needed, but in applications other sections are also used. 


Introduction 


Observations of prices of stocks, positions of a diffusing particle and many 
other processes observed in time are often modelled by a stochastic process. A 
stochastic process is an umbrella term for any collection of random variables 
{X(t)} depending on time t. Time can be discrete, for example, t = 0,1,2,..., 
or continuous, t > 0. Calculus is suited more to continuous time processes. 
At any time t, the observation is described by a random variable which we 
denote by X; or X(t). A stochastic process {X(t)} is frequently denoted by 
X or with a slight abuse of notation also by X(t). 

In practice, we typically observe only a single realization of this process, 
a single path, out of a multitude of possible paths. Any single path is a 
function of time t, x, = x(t), 0 < t < T; and the process can also be seen 
as a random function. To describe the distribution and to be able to do 
probability calculations about the uncertain future, one needs to know the 
so-called finite-dimensional distributions. Namely, we need to specify how 
to calculate probabilities of the form P(X(t) < x) for any time t, ie. the 
probability distribution of the random variable X(t); and probabilities of the 
form P(X(t1) < z1, X(t2) < x2) for any times 11, tg, ie. the joint bivariate 
distributions of X (tı) and X(t2); and probabilities of the form 


P(X (ti) < 21, X (te) < u9,...X(tn) < Un), (3.1) 
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for any choice of time points 0 < tı < t2... < tn < T, and any n > 1 with 
X1,-..-%n, E R. Often one does not write the formula for (3.1), but merely 
points out how to compute it. 


3.1 Brownian Motion 


Botanist R. Brown described the motion of a pollen particle suspended in 
fluid in 1828. It was observed that a particle moved in an irregular, random 
fashion. A. Einstein, in 1905, argued that the movement is due to bombard- 
ment of the particle by the molecules of the fluid, he obtained the equations 
for Brownian motion. In 1900, L. Bachelier used the Brownian motion as a 
model for movement of stock prices in his mathematical theory of speculation. 
The mathematical foundation for Brownian motion as a stochastic process was 
done by N. Wiener in 1931, and this process is also called the Wiener process. 

The Brownian Motion process B(t) serves as a basic model for the cu- 
mulative effect of pure noise. If B(t) denotes the position of a particle at 
time t, then the displacement B(t) — B(0) is the effect of the purely random 
bombardment by the molecules of the fluid, or the effect of noise over time t. 


Defining Properties of Brownian Motion 
Brownian motion {B(t)} is a stochastic process with the following properties. 


1. (Independence of increments) B(t) — B(s), for t > s, is independent of 
the past, that is, of By, O < u < s, or of Fs, the o-field generated by 
B(u),u< s. 


2. (Normal increments) B(t) — B(s) has Normal distribution with mean 0 
and variance t — s. This implies (taking s = 0) that B(t) — B(0) has 
N(0, t) distribution. 


3. (Continuity of paths) B(t), t > 0 are continuous functions of t. 


The initial position of Brownian motion is not specified in the definition. 
When B(0) = x, then the process is Brownian motion started at x. Properties 
1 and 2 above determine all the finite-dimensional distributions (see (3.4) be- 
low) and it is possible to show (see Theorem 3.3) that all of them are Gaussian. 
P, denotes the probability of events when the process starts at x. The time 
interval on which Brownian motion is defined is [0, T] for some T > 0, which 
is allowed to be infinite. 

We don’t prove here that a Brownian motion exists, it can be found in many 
books on stochastic processes, and one construction is outlined in Section 5.7. 
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However we can deduce continuity of paths by using normality of increments 
and appealing to Theorem 2.28. Since 


E(B(t) — B(s))* = 3(¢ — $)’, 
a continuous version of Brownian motion exists. 


Remark 3.1: A definition of Brownian motion in a more general model (that 
contains extra information) is given by a pair {B(t), F+}, t > 0, where F; is 
an increasing sequence of o-fields (a filtration), B(t) is an adapted process, i.e. 
B(t) is F, measurable, such that Properties 1-3 above hold. 


An important representation used for calculations in processes with inde- 
pendent increments is that for any s > 0 


B(t +s) = B(s) + (B(t + s) — B(s)), (3.2) 


where two variables are independent. An extension of this representation is 
the process version. 
Let W(t) = B(t+ s) — B(s). Then for a fixed s, as a process int, W(t) isa 
Brownian motion started at 0. This is seen by verifying the defining properties. 
Other examples of Brownian motion processes constructed from other pro- 
cesses are given below, as well as in exercises. 


Example 3.1: Although B(t) — B(s) is independent of the past, 2B(t) — B(s) or 
B(t) — 2B(s) is not, as, for example, B(t) — 2B(s) = (B(t) — B(s)) — B(s), is a sum 
of two variables, with only one independent of the past and B(s). 


The following example illustrates calculations of some probabilities for Brow- 
nian motion. 


Example 3.2: Let B(0) =0. 

We calculate P(B(t) < 0 for t = 2) and P(B(t) < 0 for t = 0,1, 2). 

Since B(2) has Normal distribution with mean zero and variance 2, 

P(B(t) < 0 for t = 2) = ż. 

Since B(0) = 0, P(B(t) < 0 for t = 0,1,2) = P(B(1) < 0, B(2) < 0). Note that B(2) 
and B(1) are not independent, therefore this probability cannot be calculated as a 
product P(B(1) < 0)P(B(2) < 0) = 1/4. Using the decomposition B(2) = B(1) + 
(B(2) — B(1)) = B(1)+W (1), where the two random variables are independent, we 
have 


P(B(1) < 0,B(2)<0) = P(B(1) 
= P(B(1) 


By conditioning and by using Theorem 2.24 and (2.20) 


0 0 
P(B(1) <0, W(1) < —B(1)) = i P(W(1) < —2)f(2)de = I d(-z)dë(2), 


=00 —co 
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where ®(z) and f(x) denote the distribution and the density functions of the standard 
Normal distribution. By changing variables the last integral, we obtain 


[ O(a) f(—a)dx = A B(a2)d®(a) = J, yđy= a 


Transition Probability Functions 


If the process is started at x, B(0) = x, then B(t) has the N(x, t) distribution. 
More generally, the conditional distribution of B(t + s) given that B(s) = x 
is N(«,t). The transition function P(y,t,x,s) is the cumulative distribution 
function of this distribution, 
Ply, t,2,8) = P(B(t +s) < ylB(s) = 2) = Po(B(t) < y). 

The density function of this distribution is the transition probability den- 
sity function of Brownian motion, 

1 (y= 2)? 


Cl base mk aE, (3.3) 


The finite-dimensional distributions can be computed with the help of the 
transition probability density function, by using independence of increments 
in a way similar to that exhibited in the above example. 


P,.(B(t1) < x1, B(t2) < HQ,--- , Btn) < Ln) = (3.4) 


Ly T2 In 
Í pe, (x, y1 )dy1 / Pto—t, (Y1, Y2)dy2 - . si Ptn—tn—1(Yn—1) Yn)dyn. 
CO CO CO 


Space Homogeneity 


It is easy to see that the one-dimensional distributions of Brownian motion 
satisfy Po(B(t) € A) = P,(B(t) € x + A), where A is an interval on the line. 

If B*(t) denotes Brownian motion started at x, then it follows from (3.4) 
that all finite-dimensional distributions of B*(t) and x + B°(t) are the same. 
Thus B*(t) — x is Brownian motion started at 0, and B°(t) + x is Brownian 
motion started at x, in other words 


B*(t) =2 + B(t). (3.5) 


The property (3.5) is called the space homogeneous property of Brownian mo- 
tion. 


Definition 3.1 A stochastic process is called space-homogeneous if its finite- 
dimensional distributions do not change with a shift in space, namely if 


P(X (t1) < £1, X (te) < £2,... X (tn) < an|X (0) = 0) 
= P(X(t1) < xı +x, X (te) < £2 ta,...X (tn) < £n +2|X(0) = z). 
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Four realizations of Brownian motion B = B(t) started at 0 are exhibited 
in Figure 3.1. Although it is a process governed by the pure chance with zero 
mean, it has regions where motion looks like it has “trends”. 
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Figure 3.1: Four realizations or paths of Brownian motion B(t). 


Brownian Motion as a Gaussian Process 


Recall that a process is called Gaussian if all its finite-dimensional distributions 
are multivariate Normal. 


Example 3.3: Let random variables X and Y be independent Normal with 
distributions N(1,07) and N(u2,02). Then the distribution of (X, X +Y) 
is bivariate Normal with mean vector (41,41 + u2) and covariance matrix 


op ot 
op otoa | 
To see this let Z = (Z1, Z2) have standard Normal components, then it is 


easy to see that 
(X,X+Y)=p+ AZ, 


O1 0 
Oi O02 
the definition of the general Normal distribution as a linear transformation of 
standard Normals (see Section 2.6). 


where u = (u1, 41 + u2), and matrix A = . The result follows by 
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Similarly to the above example, the following representation 


(B(t1), B(t2),..., BUtn)) 
= (B(t1), B(t1) + (B(t2) — B(t1)),- -< , Bltn—1)) + (Bltn)) — Bltn—1)) 


shows that this vector is a linear transformation of the standard Normal vector, 
hence it has a multivariate Normal distribution. 

Let Yı = B(ti), and for k > 1, Yp = B(t,) — B(tk-1). Then by the prop- 
erty of independence of increments of Brownian motion, Y;’s are independent. 
They also have Normal distribution, Yı ~ N(0,t1), and Yp ~ N(0,t, — tk-1). 
Thus (B(t1), B(t2),..., B(tn)) is a linear transformation of (Y1, Y2,..., Yn). 
But Yı = 4121, and Yp = Vik — tk—1Zk, where Z,’s are independent stan- 
dard Normal. Thus (B(ti), B(t2),...,B(tn)) is a linear transformation of 
(Z1,...,Zn). Finding the matrix A of this transformation is left as an ex- 
ercise (Exercise 3.7). 


Definition 3.2 The covariance function of the process X(t) is defined by 
qls, t) = Cov(X(t),X(s)) = E(X(t) — EX(t))(X(s) — EX(s)) 
E(X(t)X(s)) — EX(t)EX(s). (3.6) 


The next result characterizes Brownian motion as a particular Gaussian 
process. 


Theorem 3.3 A Brownian motion is a Gaussian process with zero mean func- 
tion, and covariance function min(t, s). Conversely, a Gaussian process with 
zero mean function, and covariance function min(t, s) is a Brownian motion. 


PROOF: Since the mean of the Brownian motion is zero, 
7(s, t) = Cov(B(t), B(s)) = E(B()B(s)). 
If t < s then B(s) = B(t) + B(s) — B(t), and 
E (B(t), B(s)) = EB*(t) + E(B(t)(B(s) — B(t))) = EB? (t) =t, 


where we used independence of increments property. Similarly if t > s, 
E(B(t)B(s)) = s. Therefore 


E (B(t) B(s)) = min(t, s). 
To show the converse, let t be arbitrary and s > 0. X(t) is a Gaussian 


process, thus the joint distribution of X(t), X(t+) is a bivariate Normal, and 
by conditions has zero mean. Therefore the vector (X (t), X (t+s)— X(t) is also 
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bivariate Normal. The variables X(t) and X(t + s) — X(t) are uncorrelated, 
using that Cov(X (t), X(t + s)) = min(t, s), 


Cov(X(t), X(t+s)—X(t)) = Cov(X(t), X(t+s))—Cov( X(t), X(t)) = t-t =0. 


A property of the multivariate Normal distribution implies that these variables 
are independent. Thus the increment X(t + s) — X(t) is independent of X (t) 
and has N(0,s) distribution. Therefore it is a Brownian motion. 


Example 3.4: We find the distribution of B(1) + B(2) + B(3) + B(4). 

Consider the random vector X = (B(1), B(2), B(3), B(4)). Since Brownian motion is 
a Gaussian process, all its finite-dimensional distributions are Normal, in particular X 
has a multivariate Normal distribution with mean vector zero and covariance matrix 
given by oij = Cov(Xi, Xj). For example, Cov(X1, X3) = Cov((B(1), B(3)) = 1. 


1 1 1 1 
Ti 2-2 2 
»3=|1i 233 
1 2 3 4 
Now, let a = (1,1,1,1). Then 


aX = Xı + X2 + X + X4 = B(1) + B(2) + B(3) + B(4). 


aX has a Normal distribution with mean zero and variance aXa’, and in this case 
the variance is given by the sum of the elements of the covariance matrix. Thus 
B(1) + B(2) + B(3) + B(4) has a Normal distribution with mean zero and variance 
30. Alternatively, we can calculate the variance of the sum by the formula 


Var(Xı + X2 + X3 + X4) 
= Cov(Xı + X+ X; + X4, Xı + X2 + Xs + X14) = $5 Cou(X:, Xj) = 30. 


ij 


Example 3.5: To illustrate the use of scaling, we we find the distribution of 
B(ł)+B(4)+B(ł)+B(1). Consider the random vector Y = (B(4), B(4), B(3), B(1)). 
It is easy to see that Y and 1/2X , where X = (B(1), B(2), B(3), B(4)) have the same 
law. Therefore its covariance matrix is given by i5, with © as above. Consequently, 
aY has a Normal distribution with mean zero and variance 30/4 


Example 3.6: We find the probability P( f B(t)dt > 55). 
Comment first that since Brownian motion has continuous paths, the Riemann in- 
tegral J. 3 B(t)dt is well defined for any random path as we integrate path by path. 


To find the required probability we need to know the distribution of J; B(t)dt. This 
can be obtain as a limit of the distributions of the approximating sums, 


5 B(ti)A, 
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where points t; partition [0,1] and A = ti41 — t;. If, for example, t; = i/n, then for 
n = 4 the approximating sum is + (BG) + B(4)+ B($) + B(1)), the distribution of 


which was found in the previous example to be N (0, 33). Similarly, the distribution 


of all of the approximating sums is Normal with zero mean. It can be shown that 
the limit of Gaussian distributions is a Gaussian distribution. Thus i B(t)dt has 
a Normal distribution with zero mean. Therefore it only remains to compute its 


(f roa 7 = Cov (f ve dt, free) 
(fata f ow) = f [core at 
0 
[| [ coe. Bomas- f [ min(t, s)dtds = 1/3 


Exchanging the integrals and expectation is justified by Fubini’s theorem since 


1 1 1 1 
J | sewBe|aas< | Í Visdtds < 1. 
0 0 0 0 


Thus J > B(t)dt has N(0, 1/3) distribution, and the desired probability is e 
mately 0.025. Later we shall prove that the distribution of the integral fB 5 t)dt is 
Normal N (0, a*/3) by considering a transformation to Itô integral, see FF 6.4. 


Va 


3 


Brownian Motion as a Random Series 


The process 
2 sin(jt) 
teca Ee (3.7) 
VT 2 est 


where €;’s j = 0,1,..., are independent standard Normal random variables, 
is Brownian motion on [0,7]. Convergence of the series is understood almost 
surely. This representation resembles the example of a continuous but nowhere 
differentiable function, Example 1.2. One can prove the assertion by showing 
that the partial sums converge uniformly, and verifying that the process in 
(3.7) is Gaussian, has zero mean, and covariance min(s,t) (see, for example, 
Breiman (1968), p.261, It6 and McKean (1965), p.22). 


Remark 3.2: A similar, more general representation of a Brownian motion is 
given by using an A o of functions on [0, T], h;(t). B(t) = 


Drao €;H,;(t), where H;( =f hy s)ds, is a Brownian motion on [0, T]. 
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3.2 Properties of Brownian Motion Paths 


An occurrence of Brownian motion observed from time 0 to time T, is a random 
function of t on the interval [0, T]. It is called a realization, a path or trajectory. 


Quadratic Variation of Brownian Motion 


The quadratic variation of Brownian motion [B, B](t) is defined as 


[B, B\(t) = [B, B]([0, t]) = lim }7|B@?) — B)’, (3.8) 


i=l 


where the limit is taken over all shrinking partitions of [0,¢], with ôn = 
max;(t?,, — tr) > 0 as n — oo. It is remarkable that although the sums 
in the definition (3.8) are random, their limit is non-random, as the following 
result shows. 


Theorem 3.4 Quadratic variation of a Brownian motion over [0,t] is t. 


PROOF: We give the proof for a sequence of partitions, for which $, dn < 
oo. An example of such is when the interval is divided into two, then each 
subinterval is divided into two, etc. Let Tan = 5°, |B(t?) — B(t?_,)|?. It is easy 
to see that 


n 


EDP #) — BL)? = Gt -—#,) =t-0=8. 


i=1 


By using the fourth moment of N(0,¢7) distribution is 304, we obtain the 
variance of Tn 


Var(Ta) = Var() BŒ) -B Gai See B)? 


= > 3- ta)? <3 max(t} — t} 1)t = 3tôn. 


Therefore Jy pran n) < co. Using monotone convergence theorem, we find 
EX (Tn - ET. n)? < œ. This implies that the series inside the expectation 
converges almost surely. Hence its terms converge to zero, and Tn — ET, —> 0 
a.s., consequently Tn > t a.s. 

It is possible to show that Tn — t a.s. for any sequence of partitions which 
are successive refinements and satisfy ôn — 0 as n — oo (see for example, 
Loeve (1978), Vol. 2, p.253 for the proof, or Breiman (1968)). 
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Varying t, the quadratic variation process of Brownian motion is t. Note that 
the classical quadratic variation of Brownian paths (defined as the supremum 
over all partitions of sums in (3.8), see Chapter 1 is infinite (e.g. Freedman 
(1971), p.48.) 


Properties of Brownian paths 


B(t)’s as functions of t have the following properties. Almost every sample 
path B(t),0<t<T 


1. is a continuous function of t; 

2. is not monotone in any interval, no matter how small the interval is; 
3. is not differentiable at any point; 

4. has infinite variation on any interval, no matter how small it is; 


5. has quadratic variation on [0, t] equal to t, for any t. 


Properties 1 and 3 of Brownian motion paths state that although any realiza- 
tion B(t) is a continuous function of t, it has increments A B(t) over an interval 
of length At much larger than At as At —> 0. Since E(B(t+ At) — B(t))? = At, 
it suggests that the increment is roughly like At. This is made precise by 
the quadratic variation Property 5. 

Note that by Theorem 1.10, a positive quadratic variation implies infinite 
variation, so that Property 4 follows from Property 5. Since a monotone 
function has finite variation, Property 2 follows from Property 4. 

By Theorem 1.8 a continuous function with a bounded derivative is of 
finite variation. Therefore it follows from Property 4 that B(t) can not have 
a bounded derivative on any interval, no matter how small the interval is. 
It is not yet the non-differentiability at any point, but it is close to it. For 
the proof of the result that with probability one Brownian motion paths are 
nowhere differentiable (due to Dvoretski, Erdés and Kakutani) see Breiman 
(1968) p.261. Here we show a simple statement 


Theorem 3.5 For any t almost all trajectories of Brownian motion are not 
differentiable at t. 


PROOF: Consider ZEAE) = vaz = 4, for some standard Normal 
random variable Z. Thus the ratio converges to oo in distribution, since 

PAI > K) — 1 for any K, as A — 0, precluding existence of the 
derivative at t. 


To realize the above argument on a computer take e.g. A = 107%. Then 
AB(t) = 1071°Z, and AB(t)/A = 10'°Z, which is very large in absolute 
value with overwhelming probability. 
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3.3 Three Martingales of Brownian Motion 


In this section three main martingales associated with Brownian motion are 
given. Recall the definition of a martingale. 


Definition 3.6 A stochastic process {X(t),t > 0} is a martingale if for any 
t it is integrable, E|X(t)| < co, and for any s >0 


E(X(t+ s)|Fx) = X(t), as. (3.9) 


where F, is the information about the process up to time t, and the equality 
holds almost surely. 


The martingale property means that if we know the values of the process up 
to time t, and X(t) = x then the expected future value at any future time is 
is 


Remark 3.3: F; represents information available to an observer at time t. A 
set A € F, if and only if by observing the process up to time t one can decide 
whether or not A has occurred. Formally, Fe = o(X(s),0 < s < t) denotes 
the o-field (c-algebra) generated by the values of the process up to time t. 


Remark 3.4: As the conditional expectation given a o-field is defined as a 
random variable (see for example, Section 2.7), all the relations involving con- 
ditional expectations, such as equalities and inequalities, must be understood 
in the almost sure sense. This will always be assumed, and the almost sure 
“a.s.” specification will be frequently dropped. 


Examples of martingales constructed from Brownian motion are given in 
the next result. 


Theorem 3.7 Let B(t) be Brownian Motion. Then 


1. B(t) is a martingale. 
2. B(t)? —t is a martingale. 


B(t)— St 


3. For any u, e” is a martingale. 


PrRooF: The key idea in establishing the martingale property is that for any 
function g, the conditional expectation of g(B(t + s) — B(t)) given F, equals 
to the unconditional one, 


E(9(B(t +s)— B(t))|Fi) E E(g(B(t +s8s)— B(t))), (3.10) 
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due to independence of B(t + s) — B(t) and F;. The latter expectation is just 
Eg(X), where X Normal N(0,s) random variable. 
1. By definition, B(t) ~ N(0,t), so that B(t) is integrable with E(B(t)) = 0. 
E(B(t+s)|F:) = E(B(t) + (B(t+ s) — B(t))|Ft) 
= E(Bt)|F:) + E(B + s) — BOF) 
Bit) + E(B(t + s) — B(t)) = Bd). 


2. By definition, E(B?(t)) = t < œœ, therefore B?(t) is integrable. Since 


B'(t+s) = (B(t)+ B(t+s)— Bit)’ 
= B*(t)+2B()(BE+s)—B)) + (B(t + s) — Bid)? 


E(B? (t + s)| F+) 
= B?°(t)+2E(B)(B(t+ s) — BF) + E((B(t + 5) — B())’ |F) 
= B(t)+s, 


where we used that B(t + s) — B(t) is independent of F, and has mean 0, and 
(3.10) with g(x) = x”. Subtracting (t+ s) from both sides gives the martingale 
property of B?(t) — t. 

3. Consider the moment generating function of B(t), 


E(e¥2) = et /2 < o, 
since B(t) has the N(0, t) distribution. This implies integrablity of e“8©-t?/2, 
moreover 
E(e”B®=tu?/2) =. 
The martingale property is established by using (3.10) with g(x) = e””. 
E (e BEF, ) -E (e uB(t)+u(B(t+s)— BO) |F,) 


e“"BPOR (ARAA BM DIF) ( since B(t) is F;-measurable) 


aaa) ( since increment is independent of F+) 


The aha property of e”? (t)—tu?/2 


is obtained by multiplying both sides 
by e= u2 (t+s) | 
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Remark 3.5: All three martingales have a central place in the theory. The 
martingale B?(t) — t provides a characterization (Levy’s characterization) of 
Brownian motion. It will be seen later that if a process X(t) is a continuous 
martingale such that X?(t) — t is also a martingale, then X(t) is Brownian 
motion. The martingale e“? (t)—tu*/2 ig known as the exponential martingale, 
and as it is related to the moment generating function, it is used for establishing 
distributional properties of the process. 


3.4 Markov Property of Brownian Motion 


The Markov Property states that if we know the present state of the process, 
then the future behaviour of the process is independent of its past. The process 
X(t) has the Markov property if the conditional distribution of X (t+ s) given 
X(t) = x, does not depend on the past values (but it may depend on the 
present value x). The process “does not remember” how it got to the present 
state x. Let F; denote the o-field generated by the process up to time t. 


Definition 3.8 X is a Markov process if for any t and s > 0, the conditional 
distribution of X(t+ s) given F, is the same as the conditional distribution of 
X(t+s) given X(t), that is, 


P(X(t+s) < y| Fe) = P(X(t4+ 8) < y|X(t)), as. (3.11) 
Theorem 3.9 Brownian motion B(t) possesses Markov property. 


PROOF: It is easy to see by using the moment generating function that the 
conditional distribution of B(t + s) given F, is the same as that given B(t). 
Indeed, 


B(et2+9)|£,) = BOE (evitt+)-BO)/F,) 
et BOE Ga (since e”(P0+s)-B¢) is independent of F;) 


eXB(t) eu's/2 (since B(t + s) — B(t) is N(0,s)) 
uBR (ex e+9)-PO) |B | =E (e+) 2) ; 


The transition probability function of a Markov process X is defined as 
P(y,t, x, 8) = P(X(t) < y|X(s) = x) 


the conditional distribution function of the process at time t, given that it is 
at point x at time s < t. It is possible to choose them so that for any fixed 
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x they are true probabilities on the line. In the case of Brownian motion it is 
given by the distribution function of the Normal N (x,t — s) distribution 
Ply,t,2, 8) / ra 
„t,£, s) = —— e 7 du. 
, -œ y 27(t — 8) 
The transition probability function of Brownian motion satisfies P(y, t, £, s) = 
P(y,t — s,x,0). In other words, 


P(B(t) < y|B(s) = 2) = P(B(t — s) < y|B(0) =2). (312) 


For fixed x and t, P(y,t,x,0) has the density p(x, y) is given by (3.3). The 
property (3.12) states that Brownian motion is time-homogeneous, that is, its 
distributions do not change with a shift in time. For example, the distribution 
of B(t) given B(s) = x is the same as that of B(t — s) given B(0) = z. 
It follows from (3.12) and (3.4) that all finite-dimensional distributions of 
Brownian motion are time-homogeneous. 

In what follows P, denotes the conditional probability given B(0) = z. 
More information on transition functions is given in Section 5.5. 


Stopping Times and Strong Markov Property 


Definition 3.10 A random time T is called a stopping time for B(t), t > 0, 
if for any t it is possible to decide whether T has occurred or not by observing 
B(s), 0< s <t. More rigorously, for any t the sets {T < t} € Fy, the o-field 
generated by B(s), 0<s<t. 


Example 3.7: Examples of stopping times and random times. 


1. Any non-random time T is a stopping time. Formally, {T < t} is either the Ø 
or Q, which are members of F, for any t. 


2. Let T be the first time B(t) takes value (hits) 1. Then T is a stopping time. 
Clearly, if we know B(s) for all s < t then we know whether the Brownian 
motion took value 1 before or at t or not. Thus we know that {T < t} has 
occurred or not just by observing the past of the process prior to t. Formally, 
{T < t} = {B(u) < 1,for all u < t} E€ F. 

3. Similarly, the first passage time of level a, Ta = inf{t > 0 : B(t) = a} isa 
stopping time. 

4. Let T be the time when Brownian motion reaches its maximum on the interval 
[0,1]. Then clearly, to decide whether {T < t} has occurred or not, it is not 
enough to know the values of the process prior to t, one needs to know all the 
values on the interval [0, 1]. So that T is not a stopping time. 


5. Let T be the last zero of Brownian motion before time t = 1. Then T is not 
a stopping time, since if T < t, then there are no zeros in (t, 1], which is the 
event that is decided by observing the process up to time 1, and this set does 
not belong to F+. 
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The strong Markov property is similar to the Markov property, except that in 
the definition a fixed time t is replaced by a stopping time T. 


Theorem 3.11 Brownian motion B(t) has the Strong Markov property: for 
any finite stopping time T the regular conditional distribution of B(T +t), t > 0 
given Fr is Par), that is, 


P(B(T + t) < y|Fr) = P(B(T + t) < y|B(L)) as. 


Corollary 3.12 Let T be a finite stopping time. Define the new process in 
t> 0 by . 
Bit) = B(T +t) — B(T). (3.13) 


Then B(t) is a Brownian motion is started at zero and independent of Fr. 


We don’t give the proof of the strong Markov property here, it can be found, 
for example in Rogers and Williams (1994) p.21, and can be done by using the 
exponential martingale and the Optional Stopping Theorem given in Chapter 
ie 

Note that the strong Markov property applies only when T is a stopping 
time. If T is just a random time, then B(T +t) — B(T) need not be Brownian 
motion. 


3.5 Hitting Times and Exit Times 


Let Ty denote the first time B(t) hits level x, Te = inf{t > 0: B(t) = x}. 
Denote the time to exit an interval (a,b) by T = min(Ta, Th). 


Theorem 3.13 Let a < x < b, and T = min(Ta, To). Then P(T < œ) = 1 
and E,T < oo. 


PROOF: {r > 1}= {a < B(s) < b, foral0 < s< 1} c {a< B(1) < b}. 
Therefore we have 


P,(7 > 1) < P4(B(1) € (a,b) (2—u)"/2 dy, 


y 2T +e 
The function P,(B(1) € (a,b)) is continuous in x on [a,b], hence it reaches 
its maximum @ < 1. By using the strong Markov property we can show that 
P(T > n) < 0”. For any non-negative random variable X > 0, EX < 
po P(X > n) (see Exercise 3.2). Therefore, 


EzT < So = — 
n=0 
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The bound on P,(7 > n) is established as follows 


P(T > n) = Pz(B(s) € (a,b), O< s <n) 

= P,(B(s) € (a,b), 0< s<n-—1, B(s) € (a,b), n-1<s<n) 
= P(t >n-1,B(s) € (a,b), n—1<s<n) 

= P(t >n—1,B(n—1)+ B(s) € (a,b), 0< s <1) by (3.13) 


= E(P,(7 >n—1,BY(s) € (a,b), 0 < s < 1|B(n — 1) = y)) 

= E((P.(r >n—1|B(n— 1) = y)Pa(Ê” (8) € (a,b), 0 < 8 < 1)|B(n — 1) = y)) 
= E((Pa(r >n- 1|B(n— 1) =y))P,(A(s) € (a,b), 0< 8 < 1) 

< maxP,(B(s) € (a,b), 0< s < 1)Pa(T > n-— 1) 

< 6P3(7 >n—1) <0", by iterations. 


The next result gives the recurrence property of Brownian motion. 
Theorem 3.14 
Pa(Th < œ) =1, Pa(Ta < œ)=1 
ProoF: The second statement follows from the first, since 
Pa(Ta < co) > Pa(Ty < œ)P(Ta < œ) = 1. 


We show now that Po(Tı < co) = 1, for other points the proof is similar. 
Observe firstly that by the previous result and by symmetry, for any a and b 


P age (Ta < Te) =- 


Hence Po(T_ LS Tı) = 4, Ps: (T_ 30< Tı) = = 4, P_3(T_7 < Tı) = 4, etc. 
Consider now Po(T_(gn-1) < Tı). Since the paths of Brownian motion are 
continuous, to reach —(2" — 1) the path must reach —1 first, then it must go 


from —1 to —3, etc. Hence we obtain 
Po(T_(an-1) < Tı) 


= Po(T-1 < T1)P-1(T_3 < T1)...P_(an-1-1)(T_@n-1) < Ti) = 


Qn 


If An denotes the event that Brownian motion hits —(2” — 1) before it hits 1, 
then we showed that P(A,) = 27”. Notice that An C An_1, as if Brownian 
motion hits —(2” — 1) before 1, it also hits the points bigger than —(2” — 1). 
Thus 
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and 


P(() Ai) = lim P(An) = lim 2” =0. 


n—co n—> oo 
i=1 
This implies that 
co 
P((] A) =1. 
n=1 
In other words, with probability 1, one of the events complimentary to A, 
occurs, that is, there is n such that Brownian motion hits 1 before it hits 


—(2”—1). This implies that Po(Tı < co) = 1. Another proof of this fact, that 
uses properties of martingales, is given in Chapter 7. 


3.6 Maximum and Minimum of Brownian Mo- 
tion 


In this section we establish the distribution of the maximum and the minimum 
of Brownian motion on [0, t], 


M(t) = hes B(s) and m(t) = pe Bls), 


as well as the distribution of the first hitting (passage) time of x, 
T, = inf{t > 0: B(t) = x}. 
Theorem 3.15 For any x > 0, 


Po(M(t) > x) = 2Po(B(t) > £) = 2(1— (>) 


where ®(x) stands for the standard Normal distribution function. 


PROOF: Notice that the events {M(t) > x} and {Ty < t} are the same. 
Indeed, if the maximum at time t is greater than x, then at some time before t 
Brownian motion took value x, and if Brownian motion took value x at some 
time before t, then the maximum will be at least x. Since 


{B(t). >a} CAT, <t} 
P(B(t) > x) = P(B(t) Soe < t). 
As B(T,) = 2, 


P(B(t) > x) = P(Tr < t, B(Tr + (t — Tr)) — B(Tr) > 0). 
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By Theorem 3.14, T, is a finite stopping time, and by the strong Markov 
property (3.13), the random variable B(s) = B(T,+s)—B(T;,) is independent 
of Fr, and has a Normal distribution, so we have 


P(B(t) > x) = P(T, < t, B(t— Ty) > 0). (3.14) 
If we had s independent of Ty, then 
PU, <2, B@)2 0): = PG, < 2)P(BGS 0) 
= P(T < t)5 = P(t) > 2)5, (3.15) 


and we are done. But in (3.14) s = t — Ty, and is clearly dependent on Ty. It 
is not easy to show that 


P(B(t) > 2) = P(T, <t, b(t- Ts) > 0) 
= P(T, <t)5 = P(M(t) > 2)5. 


The proof can be found, for example, in Dudley (1989) p.361. 


A simple application of the result is given in the following example, from which 
it follows that Brownian motion changes sign in (0, £), for any £ however small. 


Example 3.8: We find the probability P(B(t) < 0 for all t,0 < t < 1). Note that 
the required probability involves uncountably many random variables: all B(t)’s are 
less than or equal to zero, 0 < t < 1, we want to know the probability that the entire 
path from 0 to 1 will stay below 0. We could calculate the desired probability for 
n values of the process and then take the limit as n — oo. But it is simpler in this 
case to express this probability as a function of the whole path. All B(t)’s are less 
or equal zero, if and only if their maximum is less than or equal to zero. 


{B(t) < 0 for all t,0 < t < 1} = { max B(t) < 0}, 
O<t<1 


and consequently these events have same probabilities. Now, 


P( max B(t) < 0) = 1 — P( max B(t) > 0). 


O<t<1 O<t<1 
By the law of the maximum of Brownian motion, 


P( max B(t) > 0) = 2P(B(1) > 0) = 1. 


Hence P(B(t) < 0 for all t,0 < t < 1) = 0. 


To find the distribution of the minimum of Brownian motion 
m(t) = mino<s<; B(s) we use the symmetry argument, and that 
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Theorem 3.16 If B(t) is a Brownian Motion with B(0) = 0, then B(t) = 
—B(t) is also a Brownian motion with B(0) = 0. 


PROOF: The process B(t) = —B(t) has independent and normally distributed 
increments. It also has continuous paths, therefore it is Brownian motion. 


Theorem 3.17 For any x < 0 


Po( min B(s) < x) = 2Po(B(t) > —2) = 2Po(B(¢) < 2) 


The proof is straightforward and is left as an exercise. 


3.7 Distribution of Hitting Times 


Tz is finite by Theorem 3.14. The result below gives the distribution of T, and 
establishes that Ty has infinite mean. 


Theorem 3.18 The probability density of Ty is given by 


which is the Inverse Gamma density with parameters $ and = Eoly = +00. 


PrRooF: ‘Take x > 0. The events {M(t) > x} and {Ts < t} are the same, so 
that 


P(T, <t) = P(M(t 


) =z) 
> DBAS a= [ | Ze tee, 


The formula for the density of Ty is obtained by differentiation after the change 


of variables u = a in the integral. Finally, 


EoTz = t-2e~ 2 dt = 00, since tte ~ 1/4, t> oœ. 


a, 


For x < 0 the proof is similar. 


Remark 3.6: The property P(T} < co) = 1 is called the recurrence property 
of Brownian motion. Although P(T; < co) = 1, E(T;,) = œ, even though z is 
visited with probability one, the expected time for it to happen is infinite. 
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The next result looks at hitting times Ty as a process in x. 


Theorem 3.19 The process of hitting times {Tz}, £x > 0, has increments 
independent of the past, that is, for any O <a < b, Te — Ta is independent of 
B(t),t < Ta, and the distribution of the increment Te — Ta is the same as that 
of Te—a and it is given by the density 


b—a _3 _ (ba)? 
t 2e 2t 
V2 


Proor: By the strong Markov property B(t) = B(Ta+t)— B(T) is Brownian 
motion started at zero, and independent of the past B(t),t < Ty. Te — Ta = 
inf{t > 0: B(t) = b—a}. Hence T, — T, is the same as first hitting time of 
b—a by B. 


fr,—1, (t) = 


3.8 Reflection Principle and Joint Distributions 


Let B(t) be a Brownian motion started at x, and B(t) = —B(t). Then B(t) is 
a Brownian motion started at —x. The proof is straightforward by checking 
the defining properties. This is the simplest form of the reflection principle. 
Here the Brownian motion is reflected about the horizontal axis. 

In greater generality, the process that is obtained by reflection of a Brown- 
ian motion about the horizontal line passing through (T, B(T)), for a stopping 
time T, is also a Brownian motion. 

Note that for t > T the reflected path is, B(t) — B(T) = —(B(t) — B(T)), 
giving B(t) = 2B(T) — B(t). 


Theorem 3.20 (Reflection Principle) Let T be a stopping time. Define 
B(t) = Bit) fort < T, and B(t) = 2B(T) — Bit) fort > T. Then B is also 


Brownian motion. 


The proof is beyond the scope of this book, but a heuristic justification is 
that for t > T the process —(B(t) — B(T)) is also a Brownian motion by the 
strong Markov property, so that B , constructed from the Brownian motion 
before the stopping time, and another Brownian motion after the stopping 
time, is again Brownian motion. For a rigorous proof see Freedman (1971). 
Using the Reflection Principle, the joint distribution of the Brownian motion 
with its maximum can be obtained. 


Theorem 3.21 The joint distribution of (B(t), M(t)) has the density 


2 (2y— £) =@y-2)? 
foma) = [2 OU De Jasia Gi 
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Proor: Let, for y > 0 and y > x, B(t) be B(t) reflected at Ty. Then 


2) (since {M(t) > y} = {Ty < t}) 

y= z) (on {Ty <t}, B(t) = 2y - Bi) 

y— x) (since T, is the same for B and Ê) 

B(t) > 2y — x) (since y— < > 0, and {B(t) > 2y — a2} C {T} < t}) 


The density is obtained by differentiation. 


It is possible to show (see for example, Karatzas and Shreve 1988, p.123-24) 
that |B(t)| and M(t) — B(t) have the same distribution. 


Theorem 3.22 The two processes |B(t)| and M(t) — B(t) are both Markov 


processes with transition probability density function p(x, y)+p:(x, —y), where 


2)? 
p(x, y) = eae C is the transition probability function of Brownian mo- 


tion. Consequently they have same finite-dimensional distributions. 


The next result gives the joint distribution of B(t),M(t) and m(t), for a proof 
see Freedman 1971, p.26-27. 


Theorem 3.23 
P(a < m(t) < M(t) < b, and B(t) € A) = if k(y)dy, (3.17) 
A 


where k(y) = 0°. pe(2n(b — a), y) — pr(2a, 2n(b — a) + y) for t > 0 and 
a<0<b. 


Remark 3.7: Joint distributions given above are used in pricing of the so- 
called barrier options, see Chapter 11. 


3.9 Zeros of Brownian Motion. Arcsine Law 


A time point 7 is called a zero of Brownian motion if B(T) = 0. As an applica- 
tion of the distribution of the maximum we obtain the following information 
about zeros of Brownian motion. Below {B*(t)} denotes Brownian motion 
started at x. 
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Theorem 3.24 For any x Æ 0, the probability that {B*(t)} has at least one 
zero in the time interval (0, t), is oad by 
|z] 
V 2T 


PROOF: If a < 0, then due to ne of B*(t), (draw a picture of this 
event) 


wu 2e iS dh: 


P(B” has at least one zero between 0 and t) = P( max B*(t) > 0). 


Since B*(t) = B(t)+x, where B(t) is Brownian motion started at zero at time 
Zero, 


P,(B has a zero between 0 and t) = P( max, B*(t) > 0) 


= Po( max B(t)+a>0)= Po( max B(t) > =z) 


he 
= [ fr, (u)du = a i eo 


For x > 0 the proof is similar, and is based on the distribution of the minimum 
of Brownian motion. 


Using this result we can establish 


Theorem 3.25 The probability that Brownian motion B(t) has at least one 
zero in the time interval (a,b) is given by 


2 E 
— arccos = 
T b 


PROOF: Denote by h(x) = P(B has at least one zero in (a, b)|Ba = x). By 
the Markov property P(B has at least one zero in (a, b)|Ba = x) is the same 
as P(B?” has at least one zero in (0,b—a)). By conditioning 


P(B has at least one zero in (a, b)) 


II 


/ P(B has at least one zero in (a, b)|Ba = ©)P(Ba € dz) 


—oco 


L h(x)P(Ba € de) = i h(a)e-E de. 


Putting in the expression for h(x) from the previous example and performing 
the necessary calculations we obtain the result. 


The Arcsine law now follows: 
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Theorem 3.26 The probability that Brownian motion {B(t)} has no zeros in 
the time interval (a,b) is given by = arcsin \/¥. 


The next result gives distributions of the last zero before t, and the first zero 
after t. Let 


yı = sup{s < t: B(s) =0} = last zero before t. (3.18) 
bi = inf{s > t: B(s) =0} = first zero after t. (3.19) 
Note that 6; is a stopping time but + is not. 
Theorem 3.27 


T 


2 
P(y% < x) = — arcsin i (3.20) 


T 


2 t 
P(G > y) = — arcsin 4 (3.21) 
y 
2 f x 
P(t < x£, bt >y) = z arcsin T (3.22) 
PROOF: All of these follow from the previous result. For example, 


P(y < x) = P(B has no zeros in (x, t)). 
P(y+ < x, bt > y) = P(B has no zeros in (a, y)). 


Since Brownian motion is continuous, and it has no zeros on the interval 
(Y+, G4) it keeps the same sign on this interval, either positive or negative. 
When Brownian motion is entirely positive or entirely negative on an interval, 
it is said that it is an excursion of Brownian motion. Thus the previous 
result states that excursions have the arcsine law. To picture a Brownian path 
consider for every realization B = {B(t),0 < t < 1}, the set of its zeros on the 
interval [0,1], that is, the random set Lo = Lo(B) = {t : B(t) = 0,0 <t < 1}. 


Theorem 3.28 The set of zeros of Brownian motion is a random uncountable 
closed set without isolated points and has Lebesgue measure zero. 


Proor: According to the Example 3.8, the probability that Brownian motion 
stays below zero on the interval [0, 1] is zero. Therefore it changes sign on this 
interval. This implies, since Brownian motion is continuous, that it has a zero 
inside [0,1]. The same reasoning leads to the conclusion that for any positive 
t, the probability that Brownian motion has the same sign on the interval [0, t] 
is zero. Therefore it has a zero inside [0, t] for any t, no matter how small it 
is. This implies that the set of zeros is an infinite set, moreover time t = 0 is 
a limit of zeros from the right. 
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Observe next, that the set of zeros is closed, that is, if B(t,) = 0 and 
limn—soo Tn = T then B(T) = 0. This is true since B(t) is a continuous function 
of t. 

By using the strong Markov property, it is possible to see that any zero of 
Brownian motion is a limit of other zeros. If B(r) = 0, and 7 is a stopping 
time, then by (3.13) B(t) = B(t+7) — B(T) = B(t +7) is again Brownian 
motion started anew at time 7. Therefore time t = 0 for the new Brownian 
motion Ê is a limit from the right of zeros of Ê. But B(t) = B(t +7), so that 
T a limit from the right of zeros of B. However, not every zero of Brownian 
motion is a stopping time. For example, for a fixed t, y+, the last zero before t 
is not a stopping time. Nevertheless, using a more intricate argument, one can 
see that any zero is a limit of other zeros. A sketch is given below. If 7 is the 
first zero after t, then T is a stopping time. Thus the set of all sample paths 
such that 7 is a limit point of zeros from the right has probability one. The 
intersection of such sets over all rational t’s is again a set of probability one. 
Therefore for almost all sample paths the first zero that follows any rational 
number is a limit of zeros from the right. This implies that any point of Lo 
is a limit of points from Lo (it is a perfect set). A general result from the set 
theory, which is not hard to prove, states that if an infinite set coincides with 
the set of its limit points, then it is uncountable. 

Although uncountable, Lo has Lebesgue measure zero. This is seen by 
writing the Lebesgue measure of Lo as |Lo| = ie I(B(t) = O)dt. It is a 
non-negative random variable. Taking the expectation, and interchanging the 
integrals by Fubini’s theorem 


E|Lo| =e | 120) =od = | PEO = 0)dt = 0. 


This implies P(|Lo| = 0) = 1. 


Theorem 3.29 Any level set La = {t : B(t) = a,0 < t < 1} has the same 
properties as Lo. 


PROOF: Let T, be the first time with B(t) = a. Then by the strong Markov 
property, B(t) = Br, +t — Br, = Br,+t — a is a Brownian motion. The set of 
zeros of B is the level a set of B. 


3.10 Size of Increments of Brownian Motion 


Increments over large time intervals satisfy the Law of Large Numbers and 
the Law of the Iterated Logarithm. For proofs see, for example, Karatzas and 
Shreve (1988). 
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Theorem 3.30 (Law of Large Numbers) 
B 
lim 2u =0 a.s 
too of 


A more precise result is provided by the Law of Iterated Logarithm. 


Theorem 3.31 (Law of Iterated Logarithm) 


lims Be) 1 
im sup ——==— = 1, as. 
canted V2tinInt 
Bit 
lim inf © = —] a.s. 


t>co vyV2tlnlnt 


To obtain the behaviour for small t near zero the process W(t) = tB(1/t) is 
considered, which is also Brownian motion. 


Example 3.9: Let B(t) be Brownian motion. The process W(t) defined as W(t) = 
tB(1/t), for t > 0, and W(0) = 0, is also Brownian motion. Indeed, W(t) has 
continuous paths. Continuity at zero follows from the Law of Large Numbers. It is, 
clearly, a Gaussian process, and has zero mean. Its covariance is given by 


Cov(W (t), W(s)) = E(W (t)W(s)) = tsE(B(1/t)B(1/s)) = ts(1/t) = s, for s < t. 


Since W(t) is a Gaussian process with zero mean, and the covariance of Brownian 
motion, it is Brownian motion. 


This result allows us to transfer results on the behaviour of paths of Brownian 
motion for large t to that of small t. For example, we have immediately the 
Law of Iterated Logarithm near zero, from the same law near infinity. 


Graphs of some Functions of Brownian Motion 


Graphs of some Functions of Brownian motion are given in order to visualize 
these processes. To obtain these 1000 independent Normal random variables 
with mean zero and variance 0.001 were generated. Time is taken to be discrete 
(as any other variable on a computer) varying from 0 to 1 with steps of 0.001. 
The first two pictures in Figure 3.10 are realizations of White noise. Pictures 
of 0.1B(t) + t and B(t) + 0.1t (second row of Figure 3.10) demonstrate that 
when noise is small in comparison with drift, the drift dominates, and if drift 
is small, then the noise dominates in the behaviour of the process. The next 
two are realizations of the martingale B?(t) — t which has zero mean. By the 
recurrence property of Brownian motion, B(t) will always come back to zero. 
Thus B?(t) — t will always come back to —t in the long run. The last two 
pictures are realization of the exponential martingale e?—*/?, Although this 
martingale has mean 1, lim;_... e?“—*/? = 0, which can be seen by the Law 
of Large Numbers. Therefore a realization of this martingale will approach 
zero in the long run. 
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Figure 3.2: White noise and Functions of Brownian motion. 
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3.11 Brownian Motion in Higher Dimensions 


Definition 3.32 Define Brownian motion in dimension two and higher as 
a random vector B(t) = (B! (t), B7(t),...,B"(t)) with all coordinates B*(t) 
being independent one-dimensional Brownian motions. 


Alternatively Brownian motion in R” can be defined as a process with in- 
dependent multivariate Gaussian increments. It follows from the definitions, 
similarly to the one-dimensional case, that Brownian motion in R” is a Markov 
Gaussian process homogeneous both in space and time. Its transition proba- 
bility density is given by 

1 


plx, y) = ee T-Y), (3.23) 
Qrt 


where æ, y are n-dimensional vectors and ||? is the length of æ. 


Remark 3.8: In dimensions one and two Brownian motion is recurrent, that 
is, it will come back to a neighbourhood, however small, of any point infinitely 
often. In dimensions three and higher Brownian motion is transient, it will 
leave a ball, however large, around any point never to return (Polya (1922)). 


3.12 Random Walk 


The analogue of Brownian motion process in discrete time t = 0,1,2,...,n,... 
is the Random Walk process. Brownian motion can be constructed as the limit 
of Random Walks, when step sizes get smaller and smaller. Random Walks 
occur in many applications, including Insurance, Finance and Biology. 

A model of pure chance is served by an ideal coin being tossed with equal 
probabilities for the Heads and Tails to come up. Introduce a random variable 
€ taking values +1 (Heads) and —1 (Tails) with probability 5. If the coin is 
tossed n times then a sequence of random variables €1, £2, . . . , En describes this 
experiment. All €; have exactly the same distribution as €,, moreover they are 
all independent. The process Sn is a Random walk, defined by Sp = 0 and 


Sn =E + bat e+ En (3.24) 


Sn gives the amount of money after n plays when betting $1 on the out- 
comes of a coin when $1 is won if Heads come up, but lost otherwise. 

Since E(€;) = 0, and Var(€;) = E(€?) = 1, the mean and the variance of 
the random walk are given by 


E(Sp) = E(€i + €2 +.. En) = E(f1) + E(é2) +... E(En) = 0, 
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Var(S,) = Var(&1) + Var(€2) +... + Var(&,) = nVar(é1) =n, 


as the variance of asum of independent variables equals to the sum of variances. 
More generally, a random walk is the process 


Sn = So + 5 Ši, (3.25) 
i=1 


where €;’s are independent and identically distributed random variables. In 
particular, this model contains gambling on the outcomes of a biased coin 


P(& = 1) = p, P(& = -1)=q=1- p. 
Martingales in Random Walks 


Some interesting questions about Random Walks, such as ruin probabilities 
and the like, can be answered with the help of martingales. 


Theorem 3.33 The following are martingales. 


1. Sn—un, where u = E(€1). In particular, if the Random Walk is unbiased 
(u = 0), then it is itself is a martingale. 


2. (Sn — pn)? — o°n, where o? = E(€, — u)? = Var(&). 


3. For any u, etS»—"() where h(u) = ME(e“’). In particular, in the 
case P(&; = 1) =p, P(& = —1)= q= 1 — p, (en is a martingale. 


PROOF: 
1. Since, by the triangle inequality, 


E|Sn—npl = E|So+> > &—np| < E|So|+)_ Elgi|+n|y| = E]So|+n(El&i|+|x)), 


i=l i=l 


Sn — ny is integrable provided E|&1| < œ, and E|.S9| < co. To establish the 
martingale property consider for any n 


E(Sn+1lSn) = Sn + E(En41/Sn)- 


Since €,41 is independent of the past, and Sn is determined by the first n 
variables, €,+41 is independent of Sn. Therefore, E(€n41|Sn) = E(€n41). It 
now follows that 


E(Sn+1|Sn) = Sn + E(n+1|Sn) = Sn + H, 


and subtracting (n + 1)u from both sides of the equation, the martingale 
property is obtained, 


E(Sn41 — (n + 1)p|Sn) = Sn — np. 
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2. This is left as an exercise. 
3. Put Mn = evS»-"(4) Since Mn > 0, E|M,| = E(M,,), which is given by 


E(M,) = Felsn—nh(u) = eT rU) Ee Sh = en nh(u)preulSot >, £i) 


= "De hp II etfs = eso e—nh(u) II E(e“*‘) by independence 
i=1 i=1 
= eX So e—nh(u) Je% = ev So < o. 
i=1 
The martingale property is shown by using the fact that 
Sap = Sa T En41, (3.26) 


with €,41 independent of S» and of all previous €;’s i < n, or independent of 
Fn. Using the properties of conditional expectation, we have 


Efe” Fn) = Efe” Sn tuent F) 


= eSa P(etint Fp) = e" E (etin) 
euSnth(u) : 


Multiplying both sides of the above equation by e~("t+)"™), the martingale 
property is obtained, E(Mn4i1|Fn) = Mn. 

In the special case when P(€; = 1) = p, P(€é; = —1) = q = 1 — p choosing 
u = In(q/p) in the previous martingale, we have e“S1 = (q/p)® and E(e“S!) = 
1. Thus h(u) = nE(e“*) = 0, and e%S»—"h() = (q/p)5». Alternatively in 
this case, the martingale property of (q/p)*” is easy to verify directly. 


3.13 Stochastic Integral in Discrete Time 


Let Sn be an unbiased Random Walk representing the capital of a player 
when betting on a fair coin. Let H, can be the amount of money (the number 
of betting units) a gambler will bet at time n. This can be based on the 
outcomes of the game at times 1,...,n — 1 but not on the outcome at time 
n. This is an example of a predictable process. The concept of predictable 
processes plays a most important role in stochastic calculus. The process {Hn} 
is called predictable if H,, can be predicted with certainty from the information 
available at time n — 1. Rigorously, let F,_1 be the o-field generated by 
So, 51,---,Sn-1.- 


Definition 3.34 {H,,} is called predictable if for all n > 1, Hn is Fyn—1 mea- 
surable. 
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If a betting strategy {Hn }}—o is used in the game of coin tossing, then the 
gain at time t is given by 


t 

(H - SJ = X Hn(Sn — Sn—1)s (3.27) 
n=1 

since Sn — Sn-1 = +1 or —1 when the n-th toss results in a win or loss 

respectively. More generally, Sn — Sn—1 = n represents the amount of money 

lost or won on one betting unit on the n-th bet. If H, units are placed, then 

the amount of money lost or won on the n-th bet is H,(S,—Sp—1). The gain 

at time t is obtained by adding up monies lost and won on all the bets. 


Definition 3.35 The stochastic integral in discrete time of a predictable pro- 
cess H with respect to the process S is defined by 


t 
(H - SJ = HoSo + X` Hn(Sn — Sn—1) (3.28) 
n=1 
The stochastic integral gives the gain in a game of chance when betting on S 
and the betting strategy H is used. For a martingale the stochastic integral 
(3.28) is also called a martingale transform. The next result states that a 
betting system used on a martingale will result again in a martingale. 


Theorem 3.36 If Mn is a martingale, Hn is predictable and the random vari- 
ables (H - M), are integrable, then (H - M), is a martingale. 


PROOF: 
E|(H - M)¢41|Fi] = EH - M)t|Fi] + EL Ai (Mi — M(t))|Fi] 
= (H: Mji + H E[(Mizi — M) F] = (H - M)e. 


As a corollary we obtain 


Theorem 3.37 If M,, is a martingale, H, is predictable and bounded, then 
then (H - M); is a martingale. 
ProoF: The assumption of bounded H,, implies that 
t 
E|(H - M):| = E| X Hn (Mn — Mn-1)| 
n=1 
t t 
< S°E|Hn(Mn - Mn-1)| < 2C X` E|M,| < oo. 
n=1 n=1 

So that (H - M): is integrable, and the condition of the previous theorem is 
fulfilled. 
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Stopped Martingales 
Let (Mn, Fn) be a martingale and 7 be a stopping time. Recall the definition 


Definition 3.38 A random time is a stopping time if for any n, 
{r >n} © Fy. 


Take H, = 1 if n < 7, and Hn = 0 if n > 7, in other words, H, = I(T > n). 
Then H,, is predictable, because {r > n} = {T > n+1} € Fanı. The 
stochastic integral gives the martingale stopped at 7, 


(H - M)n = HoMo + Hı(Mı — Mo) +--+ + An(Mn — Mn-1) 


= Mean = MI (tr <n) + Mrl(7 >n). 


Since Hn = I(T > n) is bounded by 1, Theorem 3.37 implies that the process 
(H - M)n = Man is a martingale. Thus we have shown 


Theorem 3.39 A martingale stopped at a stopping time T, Mran is a mar- 
tingale. In particular, 
EM; an = EM. (3.29) 


Comment here that Theorem 3.39 holds also in continuous time, see Theorem 
7.14. It is a Basic Stopping result, which is harder to prove. 


Example 3.10: (Doubling bets strategy). 

Consider the doubling strategy when betting on the Heads in tosses of a fair coin. 
Bet Hı = 1. If Heads comes up then stop. The profit is Gi = 1. If the outcome is 
Tails, then bet Hz = 2 on the second toss. If the second toss comes up heads, then 
stop. The profit is G2 = 4 — 3 = 1. If the game continues for n steps, (meaning that 
the n— 1 tosses did not result in a win) then bet Hn = 27°71 on the n-th toss. If the 
n-th toss comes up Heads then stop. The profit is Gn = 2x2”~'—(1424...2"7') = 
2” — (2” — 1) = 1. The probability that the game will stop at a finite number of steps 
is one minus the probability that Heads never come up. Probability of only Tails 
on the first n tosses is, by independence, 2~”. The probability that Heads never 
comes up is the limit lim,..27” = 0, thus the game will stop for sure. For any 
non-random time T the gain process G+, t < T is a martingale with zero mean. The 
doubling strategy does not contradict the result above, because the strategy uses an 
unbounded stopping time, the first time one dollar is won. 


Further information on discrete time martingales and on their stopping is given 
in Chapter 7. 
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3.14 Poisson Process 


If Brownian motion process is a basic model for cumulative small noise present 
continuously, the Poisson process is a basic model for cumulative noise that 
occurs as a shock. 

Let A > 0. A random variable X has a Poisson distribution with pa- 
rameter A, denoted Pn(A), if it takes non-negative integer values k > 0 with 
probabilities 

AF 
ENE 

kl? 
The moment generating function of this distribution is given by 


P(X =k) =e k = 0,1,2,...... (3.30) 


E(et*) = erle"-0), (3.31) 


Defining Properties of Poisson process 


A Poisson process N (t) is a stochastic process with the following properties. 


1. (Independence of increments) N(t) — N(s) is independent of the past, 
that is, of F., the o-field generated by N(u),u < s. 


2. (Poisson increments) N(t)—N(s), t > s, has a Poisson distribution with 
parameter A(t — s). If N(0) = 0, then N(t) has the Pn(At) distribution. 


3. (Step function paths) The paths N(t), t > 0, are increasing functions of 
t changing only by jumps of of size 1. 


Remark 3.9: A definition of a Poisson process in a more general model (that 
contains extra information) is given by a pair {N(t), Ft}, t > 0, where F; is 
an increasing sequence of o-fields (a filtration), N(t) is an adapted process, 
i.e. N(t) is F, measurable, such that Properties 1-3 above hold. 


Consider a model for occurrence of independent events. Define the rate À 
as the average number of events per unit of time. Let N(t) be the number of 
events that occur up to time t, i.e. in the time interval (0, ¢]. Then N(t)—N(s) 
gives the number of events that occur in the time interval (s, t]. 

A Poisson process N(t) can be constructed as follows. Let 7, T2, ..... be 
independent random variables with the exponential exp(A) distribution, that 
is, P(t, > t) = e7™. 7’s represent the times between occurrence of successive 
events. Let Th = >>;_, Ti, be the time of the n-th event. Then 


N(t) =sup{n: Tn < t} 
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counts the number of events up to time t. It is not hard to verify the defining 
properties of the Poisson process for this construction. N(t) has the Poisson 
distribution with parameter At. Consequently, 


k 
P(N(t) =k) = ee AN k =0,1,2,....., 


EN(t) = At, and Var(N(t)) = At. 


Variation and Quadratic Variation of the Poisson Process 


Let 0 = tý < t? < ... < th = t be a partition of [0,¢]. Then it is easy to see 
that variation of a Poisson path is 


Vn (t) = lim > INE) = N (til = NE) — N(0) = N), (3.32) 


where the limit is taken when ôn = max; (t? — t?_,) — 0 and n — oo. Recall 
that variation of a pure jump function is the sum of absolute values of the 
jumps, see Example 1.6. Since the Poisson process has only positive jumps of 
size one (3.32) follows. 

To calculate its quadratic variation, observe that N (t?) — N(t?_,) takes 
only two values 0 and 1 for small t? — t?_,, hence it is the same as its square, 
N(t?) — N(t 1) = (N(t?) — N(t?_,))?. Thus the quadratic variation of N is 
the same as its variation 


[N, N](t) = lim DUN) = N (a) = N(t) - No = NG). 


Thus for a Poisson process both the variation and quadratic variation are 
positive and finite. 


Poisson Process Martingales 


The process N (t) is increasing, hence it can not be a martingale. However, the 
compensated process N(t) — At is a martingale. This martingale is analogous 
to the Brownian motion. 


Theorem 3.40 The following are martingales. 
1. N(t)— At. 
2. (N(t) — At)? — At. 


3. emu N()+urt for anyO<u<1. 
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ProoF: The martingale property follows from independence of increments, 
the Poisson distribution of increments and the expressions for the mean and 
the variance of Poisson distribution. We show the martingale property for the 
exponential martingale. 


E (Aee -E (a N NO) IF: ) 


ePU-VNOp (EAE (since N (t) is F;-measurable) 


= et oyOn (Ree NO) (increment is independent of F+) 
ESN e ; since N (t+ s)— N (t) is Poisson(As). 


Multiplying both sides by e“+*), the martingale property follows. 


Using the exponential martingale, it can be shown that the Poisson process 
has the strong Markov property. 


3.15 Exercises 


Exercise 3.1: Derive the moment generating function of the multivariate 
Normal distribution N(p, ©). 


Exercise 3.2: Show that for a non- ee random variable X, EX = 
Jo P(X 2 2)dx. Hint: use EX = fy edF (x) = fS fy dtdF (x) and change 
the oe of integration. 


Exercise 3.3: Show that if X > 0, EX < Z P(X >n). 


Exercise 3.4: Let B(t) be a Brownian motion. Show that the following 
processes are Brownian motions on [0, T]. 


1. X(t) =—B(t). 
2. X(t) = B(T — t) — B(T), where T < +00 
3. X(t) = cB(t/c?), where T < +00. 

4. X(t) =tB(1/t), t > 0, and X(0) = 0. 


Hint: Check the defining properties. Alternatively, show that the process is a 
Gaussian process with correlation function min(s, t). Alternatively, show that 
the process is a continuous martingale with quadratic variation t (this is the 
Levy’s characterization, and will be proven later.) 
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Exercise 3.5: Let B(t) and W(t) be two independent Brownian motions. 
Show that X(t) = (B(t) + W(t))/V2 is also a Brownian motion. Find corre- 
lation between B(t) and X(t). 


Exercise 3.6: Let B(t) be an n-dimensional Brownian motion, and æ is a 
non-random vector in R” with length 1, |a|? = 1. Show that W(t) = x- B(t) 
is a (one-dimensional) Brownian motion. 


Exercise 3.7: Let B(t) be a Brownian motion and 0 < t,... < tn. Give 
a matrix A, such that (B(t1), B(t2),...,B(tn))? = A(Zı,.-., Zn)”, where 
Zs are standard Normal variables. Hence give the covariance matrix of 
(B(t1), B(tz),..., B(tn)). Here T stands for transpose and the vectors are 
column vectors. 


Exercise 3.8: Let B(t) be a Brownian motion and 0 < s < t. Show that the 
conditional distribution of B(s) given B(t) = b is Normal and give its mean 
and variance. 


Exercise 3.9: Show that the random variables M(t) and |B(t)| have the 
same distribution. 


Exercise 3.10: Show that the moments of order r of the hitting time Ty are 
finite E(T’) < œ if and only if r < 1/2. 


Exercise 3.11: Derive the distribution of the maximum M(t) from the joint 
distribution of (B(t), M(t)). 


Exercise 3.12: By considering —B(t), derive the joint distribution of B(t) 
and m(t) = mins< B(s). 


Exercise 3.13: Show that the random variables M(t), |B(t)| and M(t) — B(t) 
have the same distributions, cf. Exercise 3.9. 


Exercise 3.14: The first zero of Brownian motion started at zero is 0. What 
is the second zero? 


Exercise 3.15: Let T be the last time before 1 a Brownian motion visits 0. 
Explain why X(t) = B(t+T)— B(T) = B(t+ T) is not a Brownian motion. 


Exercise 3.16: Formulate the Law of Large Numbers and the Law of Iterated 
Logarithm for Brownian motion near zero. 


Exercise 3.17: Let B(t) be Brownian motion. Show that e7°tB(e?°°®t) is a 
Gaussian process. Find its mean and covariance functions. 
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Exercise 3.18: Let X(t) be a Gaussian process with zero mean and covariance 
7(s,t) = e~@!*-s!, Show that X has a version with continuous paths. 


Exercise 3.19: Show that in a Normal Random Walk, Sn = So + X; ĉi 


USn—nu? /2 


when €;’s are standard Normal random variables, e is a martingale. 


Exercise 3.20: Let Sn = So + J`; & be a Random Walk, with 
P(& = 1) = p, P(& = -1) = 1 — p. Show that for any A, e7"—*” is a 
martingale for the appropriate value of y. 


Exercise 3.21: The process X+, is defined for discrete times t = 1,2,.... It 
can take only three values 1, 2 and 3. Its behaviour is defined by the rule: 
from state 1 it goes to 2, from 2 it goes to 3 and from 3 it goes back to 1. X41 
takes values 1, 2, 3 with equal probabilities. Show that this process is Markov. 
Show also that 


P(X3 = 3|X2 = 1 or 2, Xı = 3) # P(X3 = 3|X2 = 1 or 2). 


This demonstrates that to apply Markov property we must know the present 
state of the process exactly, it is not enough to know that it can take one of 
the two (or more) possible values. 


Exercise 3.22: A discrete-time process X(t), t = 0,1,2,..., is said to be 
autoregressive of order p (AR(p)) if there exists ai,...,a@) € R, and a white 
noise Z(t) (E(Z(t)) = 0, E(Z?(t)) = o? and, for s > 0, E(Z(t)Z(t + s)) = 0) 


such that 
p 


X) =X as X(t- s) + Z(t). 


s=1 
1. Show that X(t) is Markovian if and only if p = 1. 
2. Show that if X(t) is AR(2), then Y(t) = (X(t), X(t + 1)) is Markovian. 


3. Suppose that Z(t) is a Gaussian process. Write the transition probability 
function of an AR(1) process X (t). 


Exercise 3.23: The distribution of a random variable 7 has the lack of mem- 
ory property if P(r > a+ b|r > a) = P(T > b). Verify the lack of memory 
property for the exponential exp(A) distribution. Show that if 7 has the lack 
of memory property and a density, then it has an exponential distribution. 


Chapter 4 


Brownian Motion Calculus 


In this chapter stochastic integrals with respect to Brownian motion are intro- 
duced and their properties are given. They are also called It6 integrals, and 
the corresponding calculus It6 calculus. 


4.1 Definition of Ito Integral 


Our goal is to define the stochastic integral Hise X(t)dB(t), also denoted f XdB 
or X -B. This integral should have the property that if X(t) = 1 then 
ie dB(t) = B(T) — B(0). Similarly, if X(t) is a constant c, then the integral 
should be c(B(T) — B(0)). In this way we can integrate constant processes 
with respect to B. The integral over (0, T] should be the sum of integrals over 
two subintervals (0,a] and (a,T]. Thus if X(t) takes two values cı on (0, a], 
and c2 on (a, T], then the integral of X with respect to B is easily defined. In 
this way the integral is defined for simple processes, that is, processes which 
are constant on finitely many intervals. By the limiting procedure the integral 
is then defined for more general processes. 


It6 Integral of Simple Processes 


Consider first integrals of a non-random simple process X (t), which is a func- 
tion of t and does not depend on B(t). By definition a simple non-random 
process X (t) is a process for which there exist times 0 = to < tı <... < tn =T 


and constants co, C1,---,;Cn—1, such that 
n—1 
X(t) = colo(t) + X cilet). (4.1) 
i=0 
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The It6 integral ix (t)dB(t) is defined as a sum 


[ x t)dB(t => al (ti+1) — B(ti))- (4.2) 


i=0 


It is easy to see by using the independence property of Brownian increments 
that the integral, which is the sum in (4.2) is a Gaussian random variable with 
mean zero and variance 


i E 
r (/ x(a) =Var Ps ci (B(tiz1) — Be) 
1=0 


= Dy Var (ci(B(ti+1) — B(ti)) 


eS 
II 
P 
Q 
Sto 
P aS 
S 
+ 
= 
l 
= 
5 
< 


Example 4.1: Let X(t) = —1 for 0 < t < 1, X(t) = 1 for 1 < t < 2, and X(t) = 
for 2 < t < 3. Then (note that t; = 0, 1,2,3, ci = X(ti+1), co 1, & = 1, & = 2) 


3 
J X(t)dB(t) = co(B(1) — B(0)) + c2(B(2) — B(1)) + ¢3(B(3) — B(2)) 
0 


= —B(1) + (B(2) — B(1)) + 2(B(3) — B(2)) = 2B(3) — B(2) — 2B(1). 


Its distribution is N(0, 6), either directly as a sum of independent N(0,1)+N(0,1)+ 
N(O,4) or by using the result above. 


By taking limits of simple non-random processes, more general but still only 
non-random processes can be integrated with respect to Brownian motion. 

To integrate random processes, it is important to allow for constants ci 
in (4.1) to be random. If c;’s are replaced by random variables €;’s, then in 
order to have convenient properties of the integral the random variable €;’s 
are allowed to depend on the values of B(t) for t < ¢;, but not on future 
values of B(t) for t > t;. If F is the o-field generated by Brownian motion 
up to time t, then é; is F;,-measurable. The approach of defining the integral 
by approximation can be carried out for the class of adapted processes X (t), 
O0<t<T. 


Definition 4.1 A process X is called adapted to the filtration F = (F;), if 
for allt, X(t) is Fy.-measurable. 


Remark 4.1: In order that the integral has desirable properties, in partic- 
ular that the expectation and the integral can be interchanged (by Fubini’s 
theorem), the requirement that X is adapted is too weak, and a stronger con- 
dition, that of a progressive (progressively measurable) process is needed. X 
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is progressive if it is a measurable function in the pair of variables (t,w), i.e. 
B((0,t]) x Fs measurable as a map from [0,¢] x Q into R. It can be seen 
that every adapted right-continuous with left limits or left-continuous with 
right limits (regular, cadlag) process is progressive. Since it is easier to under- 
stand what is meant by a regular adapted process, we use ‘regular adapted’ 
terminology without further reference to progressive or measurable in (t, w) 
processes. 


Definition 4.2 A process X = {X(t), 0 <t < T} is called a simple adapted 
process if there exist times 0 = to < ty < ... < tn = T and random variables 
&0,61,---,€n-1, such that £o is a constant, Ei is Fy,-measurable (depends on the 
values a B(t) fort < ti, but not on values of B(t) for t > ti), and E(E?) < œœ 
i=0,...,n—1; such that 


X(t) = Eolo(t + Se titip] (t (4.3) 


For simple adapted processes It6 integral sles XdB is defined as a sum 


[x X (t)dB(t -YaB (ti+1) — B(ti))- (4.4) 


Note that when €;’s are random, the integral need not have a Normal distri- 
bution, as in the case of non-random c;’s. 


Remark 4.2: Simple adapted processes are defined as left-continuous step 
functions. One can take right-continuous functions. However, when the stochas- 
tic integral is defined with respect to general martingales, other than the Brow- 
nian motion, only left-continuous functions are taken. 


Properties of the It6 Integral of Simple Adapted Processes 


Here we establish main properties of the It6 integral of simple processes. These 
properties carry over to the It6 integral of general processes. 


1. Linearity. If X(t) and Y(t) are simple processes and a and ( are some 
constants then 


T Te T 
| (aX (t) + BY (t)) dB(t) =a J X(t)dB(t) + B J Y (t)dB(t). 
0 0 0 


2. For the indicator function of an interval Iça a(t) Uça a(t) = 1 when 
€ (a, 6], and zero otherwise) 


T TE b 
T Ta.a\(t)dB(t) = B(b)—B(a), | Ta. (t)X (dB (t) = J X(&)dB(t) 
0 0 a 
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3. Zero mean property. Ef, X (t)dB(t) = 0 
4. Isometry property. 


T 2 T 
e( l xas) . | E(X2(t))dt (4.5) 


PROOF: Properties 1 and 2 are verified directly from the definition. Proof 
of linearity of the integral follows from the fact that a linear combination of 
simple processes is again a simple process, and so is Iça (t) X (¢). 

Since €;’s are square integrable, then by the Cauchy-Schwarz inequality 


E\é(B(ti+1) — B(ti))| < YEE )E(B(i+1) — B(t:))? < o, 
which me that 


n-1 


E| Yala (ti+1) — B(t:))| < 5 E|é (B(ti+1) = B(t:))| < œ, (4.6) 


i=0 
and the eer integral has expectation. By the martingale property of 
Brownian motion, using that €;’s are F;,-measurable 


E((B(tit1) — B(ts))|Ft,) = &E((B(tin1) — B(ti))|Fi,) = 0, (4.7) 


and it follows that E(;(B(ti+1) — B(ti))) = 0, which implies Property 3. 
To prove Property 4, write the square as the double sum 


app &:(B(ti41) - B10) -Fre (ti+1) — B(t;))*) 


i=0 
+2) E( &€; (B(ti+1) — B(ti)) (Bltj41) — B(t;))). (4.8) 


Using the martingale property of Brownian motion, 


SBE Bira) - BED?) = SBE (E (Br) - BEP Fa) 
i=0 1=0 
= Se PB ((B(tis1) - Bet) ? \F.)) = S BeN tita — ti). 
i=0 


The last sum is exactly H E(X?(t))dt, since X(t) = €; on (ti, t:41]. By con- 
ditioning, in a similar way, we obtain for i < j, 
E (££; (B(ti+1) — B(t:)) (B(tj+1) — B(t;))) =0, 


so that the sum 5°, in (4.8) vanishes, and Property 4 is proved. 


i<j 
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It6 Integral of Adapted Processes 


Let X”(t) be a sequence of simple processes convergent in probability to the 
process X(t). Then, under some conditions, the sequence of their integrals 


Jo 5 X”(t)dB(t) also converges in ee to a limit J. The random variable 
J is taken to be the integral AX (t)dB(t). 


Example 4.2: We find ioe (t)dB(t). 
Let 0 = tọ < t] < t3 <... < tn =T bea partition of [0,7], and let 


)= Soe B(t; Iep ep (6) 


Then for any n, X” (t) is a simple adapted process. (Here €? = B(t?).) By the 
continuity of B(t), limn—œ X” (t) = B(t) almost surely as max;(t?,, — tz) — 0. The 
Itô integral of the simple function X” (t) is given by 


-1 


ie X"(1)dB(t = Dae B(thya) — B(t?)), 


We show that this sequence of integrals converges in probability to J = 3B 2(T)- iT. 


Adding and subtracting B?(t?,,), we obtain 


B(t) (Blea) — BE) = (BP) — BPE) — (BEA) - BED)”, 
J woro = 5 (Ba) - 5) - 3 BE -Be 
0 i=0 i=0 
= ZBYT) z 5B*(0) = ‘ a (B (tia) BED)’, 


since the first sum is a telescopic one. By the definition of the quadratic varia- 
tion of Brownian motion the second sum converges in probability to T. Therefore 


if: - X"(t)dB(t) converges in probability to the limit J 


i; "BOBA =< J = tim | X"(t)dB(t) = 5B°(7) 2 T. (4.9) 


0 n—-oco 0 


Remark 4.3: 


e If X(t) is a differentiable function (more generally, a function of finite 
variation), then the stochastic integral if X (t)dB(t) can be defined by 
formally using the integration by parts: 


T T 
i XAB) = X(T)BT) - X(0)B0) — f B(t)dX(t) 
0 0 
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(Paley, Wiener and Zygmund Math. Z. 37, 1933, 647-668.) 
But this approach fails when X(t) depends on B(t). 


e Brownian motion has no derivative, but it has a generalized derivative 
as a Schwartz distribution. It is defined by the following relation. For a 
smooth function g with a compact support (zero outside a finite interval) 


[on nae = - | BOs 


But this approach fails when g(t) depends on B(t). 


For simple processes the Ito integral is defined for each w, path by path, 
but in general, this is not possible. For example, fis B(w,t)dB(w, t) is 
not defined, whereas (Jo BC t)dB(t )) (w) = J(w) is defined as a limit in 
probability of integrals (sums) of simple processes. 


Theorem 4.3 Let X(t) be a regular adapta es such that with probability 
one fo X?*(t)dt < oo. Then Ité integral fe (t)dB(t) is defined and has the 
following ees 


1. Linearity. If Itô integrals of X(t) and Y(t) are defined and a and B are 
some constants then 


F T T 
f (oX(t) + BY (t)) dB(t) =a | X(t)dB(t) +B f Y (t)dB(t). 


0 
[xo Tia. (t)dB(t) = f xoa. 


The following two properties hold when the process satisfies an additional as- 
sumption 


[scene < 00. (4.10) 
0 


3. Zero mean property. If condition (4.10) holds then 


T 
E (/ xas) =0. (4.11) 


4. Isometry property. If condition (4.10) holds then 


T 2 T 
e(/ xas) -f E(X2(t))dt. (4.12) 
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PROOF: Only an outline of the proof is given. Firstly it is shown that the 
It6 integral is well defined for adapted processes that satisfy an additional 
assumption (4.10). Such processes can be approximated by simple processes 


X(t) = X(0) + FO X (Eep en, sl (4.13) 
i=0 


where {t} } is a partition of [0,7] with ôn = max;,(t?,, — t?) > 0 as n — oo. 
In this sum only one term is different from zero, corresponding to the interval 
in the partition containing t. The process X” (t) equals X(t) for all the points 
in the partition, but may differ in each small interval (t}’, t?,,). Now, 


T n—1 T 
J OPa = E La =) — f X?(t)dt, (4.14) 
0 ar 0 


T 
i E(X"(t))?dt 
0 


as n — oo, because the sums are Riemann sums for the corresponding integrals. 
Moreover, it is possible to show that, 


n-1 


T 
= Tere) — f BX2(t)dt, (4.15) 
0 


k=0 


lim i E(X” (t) — X(t))?dt = 0. (4.16) 


n—oo 0 


Denote the It6 integral of the simple process by 
T n-1 
Jn= f XOA = XEBE- BEN). 4.17 
0 k=0 


The condition (4.10) for J, holds, so that E(J;,) = 0 and by the isometry 
property E(J?) is given by the sum in (4.15). Using the isometry property 
(4.16) implies that 


E(Jn — Jm tm ( fam X"™(t)dB(t )- [xe t)dB(t o) 


T 2 T 
2 e( ace =e f (X(t) — X” (t))?dt 


IA 


T T 
æ f (x™(t) — x(0)Pat +28 | (X(t) — X(t))?dt — 0, (4.18) 
0 0 


as n,m — oo. The space L? of random variables with zero mean, finite second 
moments and convergence in the mean-square is complete, and (4.18) shows 
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that Jn form a Cauchy sequence in this space. This implies that there is an 
element J, such that Jn —> J in L?. This limit J is taken to be the Itô integral 
fo X(t)dB(t). If we were to use another approximating sequence then, it is 
not hard to check, this limit does not change. 

Now consider adapted processes with finite integral i X?(t)dt but not 
necessarily of finite expectation. It can be shown, using the previous result, 
that such processes can be approximated by simple processes by taking limit 
in probability rather than in mean square. The sequence of corresponding It6 
integrals is a Cauchy sequence in probability. It converges in probability to a 
limit f X(t)dB(t). 

For details of proof see for example, Gihman and Skorohod (1972), Liptser 
and Shiryaev (1977), Karatzas and Shreve (1988). 


Note that Ito integrals need not have mean and variance, but when they 
do, the mean is zero and the variance is given by (4.12). 


Corollary 4.4 If X is a continuous adapted process then the Itô integral 
fo X(t)dB(t) exists. In particular, i. f(B(t))dB(t), where f is a continuous 
function on R is well defined. 


PROOF: Since any path of X(t) is a continuous function, Ie X?(t)dt < œ, 
and the result follows by Theorem 4.3. If f is continuous on R, then f(B(t)) 
is continuous on [0, T]. 


Remark 4.4: It follows from the proof that the sums (4.17) approximate 
the Ito integral f? X(t)dB(t) 


> X (ENB Eha) — BE). 


In approximation of the Stieltjes integral by sums, the function f on the 
interval [t;,ti41] is replaced by its value at some middle point 6; € [t;, ti+1], 
whereas in the above approximations for the It6 integral, the left most point 
must be taken for 0; = ti, otherwise the process may not be adapted. 

It is possible to define an integral (different to It6 integral) when 0; is 
chosen to be an interior point of the interval, 0; = At; + (1 — A)ti+1, for some 
A € (0,1). The resulting integral may depend on the choice of A. When 
A = 1/2, the Stratanovich stochastic integral results. Calculus with such 
integrals is closely related to the It6 calculus. 


Remark 4.5: Note that the Ito integral does not have the monotonicity 
property: X(t) < Y(t) does not imply [S X(t)dB(t) < J Y(t)dB(t). A 
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simple counter-example is J 1 x dB(t) = B(1). With probability half this is 
smaller than 0, the Ito integral of 0. 


We give examples of It6 integrals of the form as f(B(t))dB(t) with and without 
the first two moments. 

Example 4.3: Take f(t) = e’. J eP®dB(t) is well defined as e” is continu- 
ous on R. Since E GN eB dt) = J E( eP Hdt = S edt = (e — 1) < œ, 
E( fý eP®AdB(t)) = 0, and E( fy e?adB(t))? = 4(e? — 1). 


Example 4.4: Take f(t) = t, that is, consider i B(t)dB(t). Then the condition 
(4.10) is satisfied, since S E(B?(t))dt = if tdt = 1/2 < oo. Thus ahs B(t)dB(t) has 
mean zero and variance 1/2. 


Example 4.5: Take f(t) = e”, that is, consider J e® OdB(t). Although this 

integral is well defined, the condition (4.10) fails, as J E( eP’ O)dt = œ, due to the 
a2 

fact that E(e BBY ()) =fe 207 ere -F = 00 fort > 1/4. Therefore we can not claim 


that this a on has inen moments. By using martingale inequalities given in 
the sequel, it can be shown that the expectation of the Itô integral does not exist. 


Example 4.6: Let J = fe tdB(t). We calculate E(J) and Var(J). 


Since f t?dt < oo, the Itô integral is defined. Since the integrand t is non-random, 
condition (4.10) holds and the integral has the first two moments, E(J) = 0, and 


J?) = fy dt = 1/3. 


Example 4.7: For what values of a is the integral Ja — t) “dB(t) defined? 


For the Itô integral to be defined it must have ha — t)~**dt < oo. This gives 
a < 1/2. 


A consequence of the isometry property is the expectation of the product of 
two Itô integrals. 


aa 4.5 Let X(t) and Y(t) be regular adapted processes, such that 
By t)?dt < œ and BEY (t)?dt < oo. Then 


oh T T 
i f X(t)dB(t) | roa) = T E(X(t)Y (t))dt. (4.19) 


PROOF: Denote the Itô integrals I, = ix (t)dB(t), I = Y (t)dB(t). 
Write their product by using the identity I) Iz = (I + Iz)?/2 — 17/2 — 13/2. 
Then use the isometry property. 
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4.2 Ito Integral Process 


Let X be a regular se process, such that fo X?(s)ds < co with proba- 
bility one, so that (eX s)dB 7 ) is defined for any t < T. Since it is a random 


variable for any fixed t, s)dB(s) as a function of the upper limit t defines 
a stochastic process 


= i: i X(s)dB(s). (4.20) 


It is possible to show that there is a version of the Itô integral Y(t) with 
continuous sample paths. It is always assumed that the continuous version 
of the It6 integral is taken. It will be seen later in this section that the Ito 
integral has a positive quadratic variation and infinite variation. 


Martingale Property of the It6 Integral 


It is intuitively clear from the construction of Ito integrals that they are 
adapted. To see this more formally, It6 integrals of simple processes are clearly 
adapted, and also continuous. Since Y(t) is a limit of integrals of simple pro- 
cesses, it is itself adapted. 

Suppose that in addition to the condition ing X?(s)ds < oo, condition 
(4.10) holds, ie EX?(s)ds < 00. Ae he latter implies the former by Fubini’s 
theorem.) Then Y(t) = fy X ), 0 < t < T, is defined and possesses first 
two moments. It be o a k simple processes and then in general, 


that for s < t, 
t 
E (| X (WABE, =i 


EYOF.) = B( f X(u)dB(u)|F 


‘) 
2 [ xow, x (wtb wie.) 
= [ xww yro. 


Therefore Y (t) is a martingale. The second moments of Y (t) are given by the 
isometry property, 


E if sno) = [ Eeoa (4.21) 


This shows that sup,<r E(Y?(t =j EX? (s)ds < oo. 


Thus 
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Definition 4.6 A martingale is called square integrable on [0,T] if its second 
moments are bounded. 


Thus we have 


pene 4.7 Let a : an adapted process such that i EX?(s)ds < oo. 


Then Y(t = fix ,O<t< T, ts a continuous zero mean square 
ae te 


Remark 4.6: If vee EX?(s)ds = ov, then the It6 integral is (s)dB(s) may 
fail to be a martingale, but it is always a local martingale (see Chapter 7) for 
definition and properties. 


Theorem 4.7 above provides a way of constructing martingales. 


Corollary 4.8 For any bounded function f on R, Hes f(B(s))dB(s) is a square 
integrable martingale. 


PROOF: X(t) = f(B(t)) is adapted, and since |f(a)| < K, for some constant 
K >0, fo Ef?(B(s))ds < KT. The result follows by Theorem 4.7. 


Quadratic Variation and Covariation of It6 Integrals 


The Itô integral Y (t =X ,0<t<T, is a random function of t. 
It is continuous a aie oe cen variation of Y is defined by (see 
(1.13)) 


Y, Y] © = lim FOY #2.) — Y (t), (4.22) 
i=0 


where for each n, {t?}"_9, is a partition of [0, t], and the limit is in probability, 
taken over all partitions with ôn = max;(t?,, — t7) > 0 as n — oo. 


Theorem 4.9 The quadratic Variation of the Itô integral ix (s)dB(s) is 


iven b 
g y || xen, f xoa] (t) = [ xm (4.23) 


It is easy to verify the result for simple processes, see the Example below. The 
general case can be proved by approximations by simple processes. 


Example 4.8: For simplicity, suppose that X takes only two different values on 
[0, 1]: £o on [0,1/2] and €; on [1/2, 1] 


Xt = EoLjo0,1/2) (t) + L021) (t). 
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It is easy to see that 


‘ 0 B(t) if t< 1/2 
z. R { €oB(1/2) + &(B(t)— B(1/2)) if t> 1/2. 


Thus for any partition of [0, t], 


fs _ f (BCR1)- BE) if t? < tha <1/2 
rit) YE) =| alB- BEN) if 1/2< <ta 


Including 1/2 in a partition, one can verify that: for t < 1/2 


IY. Y] (t () =1im Y (Hy — Y (E)? 


i=0 
= chim (8 (th) — B(t?))? = ELB, Bit )=de= | Xod 


and for t > 1/2 


[Y,Y](t zm (ti+1) — Y (t:))? 


= lim > (B(tis1) — B(ts))? + lim XO (B(tiz1) — B(t:))? 


ti<1/2 ti>1/2 


£2[B, B](1/2) + €[B, B]((1/2,¢] = f x%6 x 


The limits above are limits in probability when ôn = max;{(t#,,; — t?)} — 0. In the 
same way (4.23) is verified for any simple function. 


Example 4.9: Using the formula (4.23), quadratic variation of the It6 integral 


p (a) (t) = J B?(s)ds. 


ee 4.10 eh h X?(s)ds > 0, for allt < T, then the Itô integral 
= [ix ) has infinite variation on [0,t] for allt < T. 


PROOF: If Y(t) were of finite variation, its quadratic variation would be zero, 
leading to a contradiction. 


Like Brownian motion, the Itô integral Y(t) is a continuous but nowhere dif- 
ferentiable function of t. 
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Let now Yı (t) and Y2(t) be Itô integrals of Xı (t) and X2(t) with respect 
to the same Brownian motion B(t). Then, clearly, the process Yı (t) + Yo(t) is 
also an It6 integral of X1 (t) + X2(t) with respect to B(t). 

Quadratic covariation of Yı and Y> on [0, t] is defined by 


(Y1 + Y2, Yi + Y2] (t) — Y1, Ya] (¢) — [¥2, Y2] (¢)). (4.24) 


N| = 


[Y1, Yo] t) = 


By (4.23) it follows that 


(Yi, Ya] ( n= fx X1(s)Xo(s (4.25) 


It is clear that [Y1, Y2] (t) = [Y2, Y1] (t), and it can be seen that quadratic 
covariation is given by the limit in probability of products of increments of the 
processes Yı and Y> when partitions {t7 } of [0, t] shrink, 


[Yi, Ya] (t ) =n Yi (tha) — Yi(t?)) (Yat) — Yo(t?)) - 


4.3 Ito Integral and Gaussian Processes 


We have seen in Section 4.1 that the Itô integral of simple non-random pro- 
cesses is a Normal random variable. It is easy to see by using moment gener- 
ating functions (see Exercise 4.3) that a limit in probability of such a sequence 
is also Gaussian. This implies the following result. 


Theorem 4.11 X X(t) x non-random such that H X?(s)ds < œœ, then its 


Itô integral Y (t = fox (s) is a Gaussian process with zero mean and 
covariance a De He 


Cov(Y(t), Y(t +u)) = i X?(s)ds, u > 0. (4.26) 


Moreover, Y (t) is a square integrable martingale. 


PROOF: Since the integrand is non-random, le EX?(s)ds = fo X?(s)ds < œ. 

By the zero mean property of Itô integral, Y has zero mean. To Re the 
a $ z t+u t t+u a 

covariance function, write fg" as fọ +f," and use the martingale property 

of Y(t) to obtain 


BE | aneln))=6 
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Cov(Y (t), Y(t+ u)) =E (f X(s)dB(s) [7 xabis)) 
E (f xow) = [ex%oas ` [ ome 


A proof of normality of integrals of non-random processes will be done later 
by using Ito’s formula. 


Example 4.10: According to Theorem 4.11 J = T sdB(s) has a Normal N (0, t?/3) 
distribution. 


Hence 


Example 4.11: 


Let X(t) = 21o, (t) + 310,31 (t) — 51,4] (t). Give the Itô integral Jo X (t)aB(t) as 
a sum of seg variables, a its distribution, mean and variance. Show that the 
process M(t =f X(s)dB(s), 0 < t < 4, is a Gaussian process and a martingale. 


“x ()aB() - [xo orf X(t)dB(t f X(t)dB(t 
0 
es vf saso f (—5)dB(t) 


B(0)) + 3(B(3) — B(1)) — 5(B(4) — B(3)). 


The Itô integral is a sum of 3 independent Normal random variables (by independence 
of increments of Brownian motion), 2N(0,1) + 3N(0,2) — 5N (0, 1). Its distribution 
is N(0, 47). 

The martingale property and the Gaussian property of M(t) = f : X(s)dB(s) 
0<t< 4, follow from the independence of the Hee z M(t), zero mean incre- 
ments and the Normality of the increments. M(t = mi X(u (u). Take 


for example 0 < s < t < 1, then M(t) fi a S — B(s)), 
which is independent of the Brownian oe up to a 8, T Nea N(0, 4(t — s)) 
distribution. 
If0<s<1<t < 3, then M(t) = fÍ X(u)dB(u) = f} X(u)dB(u) + 
fiX = 2(B(1) — B(s)) + cS a B(1)), is independent of the 
Tg n up to time s, B(u),u < s (and also M(u),u < s), and has 
N(0,4(1 — s) + 9(t — 1)) distribution. Other cases are similar. By Theorem 2.23 
the process M (t) is Gaussian. 

Independence of increments plus zero mean of increments imply the martingale 
property of M(t). For example, If 0 < s < 1 < t < 3, E(M(t)|M(u),u < s) = 
E(M(s)+M(t)—M(s)|M(u),u < s) = M(s)+E(M(t)—M(s)|M(u),u < s) = M(s). 


If Y(t =X (t,s)dB(s) where X(t,s) depends on the upper integration 
a f ihien Y(t need not be a martingale, but remains a gaussian process 
for non-random X(t, s). 
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Theorem 4.12 For any t < T, let X(t,s) be a > non-random func- 
tion with i X?(t,s)ds < œ. Then the process Y (t =X (t,s)dB(s) is a 
Gaussian process with mean zero and covariance o t,u : 0 


t 
Cov(Y (t), Y (t + u)) = f X(t,s)X(t+ u, s)ds. (4.27) 

0 
PROOF: For a fixed t, the distribution of Y(t), as that of an Itô integral of a 
non-random function, is Normal with mean 0 and variance fe X?(t,s)ds. We 


don’t prove the process is Gaussian (it can be seen by approximating X (t, s) 
by functions of the form f(t)g(s)), but calculate the covariance. For u > 0 


Y(t+u)= [ Xerus s)dB(s) y+ fo X(t +u,s)dB(s). 


Since X(t + u,s) is non-random, the Itô integral eee X(t + u,s)dB(s) is 
independent of F. Therefore 


E (f xte.saB00) [xe uw, 8)4B(5)) =0, 


Cov(Y(t), Y(ttu)) = E(Y()Y(t+w)) 


E ( [xea [xe u,s)dB(s) ) 


[ xesxe+ u, s)ds, (4.28) 
0 


and 


where the last equality is obtained by the expectation of a product of Ito 
integrals, Equation (4.19). 


4.4 Itô’s Formula for Brownian Motion 


Itô’s formula, also known as the change of variable and the chain rule, is one 
of the main tools of stochastic calculus. It gives rise to many others, such as 
Dynkin, Feynman-Kac, and integration by parts formulae. 


Theorem 4.13 If B(t) is a Brownian motion on [0,T] and f(x) is a twice 
continuously differentiable function on R, then for any t < T 


(BQ) = F0)+ | PEGA +3 | PBO. (429) 
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PROOF: Note first that both integrals in (4.29) are well defined, the Itô 
integral by Corollary 4.4. Let {t?} be a partition of [0, t]. Clearly, 


n—1 


FBA) = FO) + $ (FB) - FBE))). 


i=0 


Apply Taylor’s formula to f(B(t?,,)) — f(B(¢?)) to obtain 


I(B) FBE) PBEDBE -BD+ OBE) - BED), 
where 0? € (B(t?), B(t?,,)). Thus, 


n—-1 
FBO) = FO +Y BEEBE) - BH) 
i=0 
+= 5 FORB (ti) — BE. (4.30) 
F a as ôn — 0, the first sum in (4.30) converges to the Itô integral 
E ). By the theorem below the second sum in (4.30) converges 
to le ee "(B(s))ds and the result follows. 


Theorem 4.14 If g is a bounded continuous function and {t?} represents 
partitions of [0, t], then for any 0} E€ (B(t}), B(t?,,)), the limit in probability 


Jim Èu 02) (B1) — Be)? = f g(B(s))ds. (4.31) 


PROOF: Take first 0? = B(t}’) to be the left end of the interval (B(¢?), B(t},,)). 
We show that the sums converge in probability 


n-1 t 


I(B (Blt) - BCE)? > | g(B(s))ds. (4.32) 


0 


By continuity of g(B(t)) and definition of the integral, it follows that 


n—-1 


$7 o( BUM) (tM - #8) > [oe (4.33) 


i=0 
Next we show that the difference between the sums converges to zero in L?, 


1 n-1 


IBE (BER) — BE)? — JO o( BU) (tens =t) +0. (4.34) 


n 


> 
ji 
© 
> 
ji 
(æ= 
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With AB; = B(t?,,) — B(t?) and At; = t?,, — t?, by using conditioning it is 
seen that the cross-product term in the following expression vanishes and 


E (x g(B(t2)) ((AB,)? - a) = E S 9(B(t))B (((AB)? - Ati)” Fi.) 
1=0 


1=0 
n-1 n-1 
= 2E X 9? (B(t?))(Ati)? < 82E X` g?(B(t?))At; > 0 as ô > 0. 
1=0 1=0 


It follows that : 
5 g(B(t?)) ((A Bi)? — At;) > 0, 
i=0 


in the square mean (L?), implying (4.34) and that both sums in (4.33) and 
(4.32) have the same limit, and (4.32) is established. Now for any choice of 
6”, we have as ôn — 0, 


< max (9(6") — 9 B(E))) X- (Bla) — BED) 40. (4.38) 


The first term converges to zero almost surely by continuity of g and B, and 
the second converges in probability to the quadratic variation of Brownian 
motion, t, implying convergence to zero in probability in (4.35). This implies 
that both sums 37") g(6”)(AB;)? and 3°") g(B(t:))(ABj)? have the same 
limit in probability, and the result follows by (4.32). 


Example 4.12: Taking f(x) = x2, m > 2, we have 


B” (t) =m f B= (jagt) + ZE f B™~?(s)ds. 
0 0 


With m = 2, : 
B? (t) = 2 | B(s)dB(s) +t. 


0 
Rearranging, we recover the result on the stochastic integral 


t RE 1 
B(s)dB(s) = =B — xt. 
[ Bean.) = 580 - 3 


Example 4.13: Taking f(x) = e”, we have 


t t 
eP™® = i+ f ec?) dB(s) + ;/ eP dg, 
0 2 Jo 
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4.5 Itô Processes and Stochastic Differentials 


Definition of It6 Processes 


An It6 process has the form 


Y(t) =¥(0)+ f WOUE f ‘g(s)\dB(s), OLE<T, (436) 


where To is Fo-measurable, T u(t) and o(t) are F,-adapted, such 


that i |u(t)|dt < œ and i t)dt < co. 
It is said that the process a i has the stochastic differential on [0, T] 


dY (t) = u(t)dt + o(t)dB(t), O<t<T. (4.37) 


We emphasize that a representation (4.37) only has meaning by the way of 
(4.36), and no other. 

Note that the processes u and ø in (4.36) may (and often do) depend on 
Y(t) or B(t) as well, or even on the whole past path of B(s), s < t; for example 
they may depend on the maximum of Brownian motion max;<; B(s). 


Example 4.14: Example 4.12 shows that 
t 
B'(t) =t+ 2 f B(s)dB(s). (4.38) 
0 


In other words, with Y(t) = B?(t) we can write Y (t =f ds+ f 2B(s)dB(s). Thus 
p(s) = 1 and o(s) = 2B(s). The stochastic Pu R of B° (t) 
d(B?(t)) = 2B(t)dB(t) + dt. 
The only meaning this has is the integral relation (4.38). 
Example 4.15: Example 4.13 shows that Y (t) = eP® has stochastic differential 
de? = ePOdB(t) + Ze Oat, 


or 


dY (t) = Y(t)dB(t) + ZY (t)dt. 


Itô’s formula (4.29) in differential notation becomes: for a C? function f 


d(f(BW)) = f'(BW)dB(t) + 5f"(BUt))at, (4.39) 


4.5. ITO PROCESSES AND STOCHASTIC DIFFERENTIALS 109 


Example 4.16: We find d(sin(B(t))). 
f(x) = sin(x), f'(x) = cos(x), f” (x) = — sin(x). Thus 


d(sin(B(t))) = cos(B(t))dB(t) — 5 sin(B(t))at. 


Similarly, 
d(cos(B(t))) = —sin(B(t))dB(t) — 5 cos( B(t) at 


Example 4.17: We find d(e’®) with i? = —1. 

The application of It6’s formula to a complex-valued function means its application 
to the real and complex parts of the function. A formal application by treating i as 
another constant gives the same result. Using the above example, we can calculate 
d(e'®) = dcos(B(t)) + idsin(B(t)), or directly by using It6’s formula with 


f(x) = e”, we have f'(x) = ie’, f(x) = —e’* and 
1. 
d (cP) = iePO AB) — se PO at. 


Thus X(t) = e’?™ has stochastic differential 


dX (t) =iX(t)dB(t) — 5X (iat. 


Quadratic Variation of It6 Processes 


Let Y(t) be an It6 process 


Y(t) = Y(0) +f nods + f o(s)dB(s), (4.40) 


where it is assumed that u and o are such that the integrals in question are 
defined. Then by the properties of the integrals, Y (t), 0 < t < T, is a (random) 
continuous function, the integral is u(s)ds is a continuous function of t and is 
of finite variation (it is differentiable almost everywhere), and the It6 integral 
h o(s)dB(s) is continuous. Quadratic variation of Y on [0,t] is defined by 
(see (1.13)) 

n-1 

i 2 
[Y] @) = [Y; Y] (0, ¢]) = lim D Yta) Ye), (4.41) 

i=0 
where for each n, {t?}, is a partition of [0,t], and the limit is in probability 
taken over partitions with ôn = max,(t},, — t7) > 0 as n — oo, and is given 
by 


mO =| woyas+ f otago] w 


=| f noas] © +2] f moas, f otago] (+ | f oas] ©. 
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The result on the covariation, Theorem 1.11, states that the quadratic 
covariation of a continuous function with a function of finite variation is zero. 
This implies that the quadratic covariation of the integral h u(s)ds with terms 
above is zero, and we obtain by using the result on the quadratic variation of 
Itô integrals (Theorem 4.9) 


[Y] (t) = | f a(s)dB(9) (t) = | o?(s)ds. (4.42) 


If Y(t) and X(t) have stochastic differentials with respect to the same Brown- 
ian motion B(t), then clearly process Y (t) + X(t) also has a stochastic differ- 
ential with respect to the same Brownian motion. It follows that covariation 
of X and Y on (0, t] exists and is given by 


[X,Y] (t) = -| [X +Y, X +Y] - [XX0 -YY e) (443) 


Theorem 1.11 has an important corollary 


Theorem 4.15 If X and Y are Itô processes and X is of finite variation, 
then covariation [X,Y] (t) = 0. 


Example 4.18: Let X(t) = exp(t), Y(t) = B(t), then [X,Y] (t) = [exp, B] (t) = 0. 


Introduce a convention that allows a formal manipulation with stochastic dif- 
ferentials. 
dY (t)dX (t) = d[X, Y] (£), (4.44) 
and in particular 
(aY (t))” = d [Y, Y] (t). (4.45) 


Since X(t) = t is a continuous function of finite variation and Y (t) = B(t) 
is continuous with quadratic variation t, the following rules follow 


dB(t)dt =0, (dt)? = 0, (4.46) 


but 
(dB(t))? = d|B, B] (t) = dt. (4.47) 


Remark 4.7: In some texts, for example, Protter (1992), quadratic variation 
is defined by adding the value Y?(0) to (4.41). The definition given here gives 
a more familiar looking formula for integration by parts, and it is used in many 
texts, for example, Rogers and Williams (1987) p.59, Metivier (1982) p.175. 
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Integrals with respect to It6 processes 


It is necessary to extend integration with sae to oe mE from 


Brownian motion. Let the Itô integral process Y (t =X s) be defined 
for all t < T, where X(t) is an adapted process, = i we oi ra < 00 
with probability one. Let an adapted process H(t) a Hh H?( a )ds < 
oo with probability one. Then the Itô integral process Z(t = [u s)dB(s) 


is also defined for all t < T. In this case one can ae write ae ae 


dY (t) and X(t)dB(t), 
w= [a H(s)dY(s p= f m H(s)X(s)dB(s). (4.48) 


In Chapter 8 integrals with respect to Y (t) will be introduced in a direct way, 
but the result agrees with the one above. 
More generally, if Y is an It6 process satisfying 


dY (t) = u(t)dt + o(t)dB(t), (4.49) 


and H ea . me fo H?(s)a7(s)ds < 00, fs |H(s)u(s)|ds < 00, 
then Z(t =i ) is defined as 


o= [a H(s)dY(s bot H(s)p(s)ds + | " H(s)o(s)4B(s). (4.50) 


Example 4.19: If a(t) denotes the number of shares held at time t, then the gain 
from trading in shares during the time interval [0, T] is given by i a(t)dS(t). 


4.6 Itô’s Formula for It6 processes 


Theorem 4.16 (It6’s formula for f(X(t))) Let X(t) have a stochastic dif- 
ferential for0 <t<T 


dX(t) = p(t)dt + o(t)dB(t). (4.51) 


If f(x) is twice continuously differentiable (C? function), then the stochastic 
differential of the process Y (t) = f(X(t)) exists and is given by 


PXE) = F(X )AX() + SFX O)ALX, XI) 
= F(X Dax + EXO) (4.52) 
= (FX WMO + EOOD AB 
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The meaning of the above is 


NX) = FEO) + | PXA) | P"CK(a%(a)as, (453) 


where the first integral is an Itô integral with respect to the stochastic dif- 
ferential. Existence of the integrals in the formula (4.53) is assured by the 
arguments following Theorem 4.13. The proof also follows the same ideas as 
Theorem 4.13, and is omitted. Proofs of Itô’s formula can be found in Liptser 
and Shiryaev (2001), p. 124, Revuz and Yor (2001) p. 146, Protter (1992), p. 
71, Rogers and Williams (1990), p. 60. 


Example 4.20: Let X(t) have stochastic differential 


dX(t) = X(t)dB(t) + SX (tat. (4.54) 
We find a process X satisfying (4.54). Let’s look for a positive process X. Using 
Itô’s formula for In X(t) ((Inz)' = 1/z and (In x)” = —1/z”), 
1 1 yo ; 


= dB(t)+ Lat — iat = dB(t). 
So that In X(t) = In X (0) + B(t), and we find 
X(t) = X0)? ®. (4.55) 


Using Itô’s formula we verify that this X(t) indeed satisfies (4.54). We don’t claim 
at this stage that (4.55) is the only solution. 


Integration by Parts 


We give a representation of the quadratic covariation [X, Y] (t) of two Itô pro- 
cesses X(t) and Y(t) in terms of It6 integrals. This representation gives rise 
to the integration by parts formula. 

Quadratic covariation is a limit over decreasing partitions of [0, t], 


n—-1 


[X,Y](t) = lim X (Xa) — X(t?) (Y (#8) - YP). (4.56) 


nm 


i=0 


The sum on the right above can be written as 


= > (XEY a) — XEN) 
1=0 
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-E xY YE) -F YEA Ea) — XD) 
=0 i=0 
= X(t)Y(t)— X(0)Y(0) 
-DO Xe) (YR) YE) -DO YEA Eha) — XH). 
i=0 i=0 


The last two sums converge in probability to Itô integrals ie X(s)dY(s) and 
fi Y(s)dX(s), cf. Remark (4.4). Thus the following expression is obtained 


X,Y = XOY) — X(0)Y(0) -f X(s)dY (s) -f Y(s)dX(s). (4.57) 
0 0 
The formula for integration by parts (stochastic product rule) is given by 
XOY (t) — X(0)Y(0) = f X(s)dY (s) + Y(s)dX(s) + [X,Y] (t). (4.58) 
0 0 


In differential notations this reads 


d(X(t)Y (t)) = X()dY(t) + Y(t)dX(t) + d[X, Y] (t). (4.59) 

If 
dX(t) = px (t)dt + ox(t)dB(t), (4.60) 
dY (t) = uy (t)dt + oy (t)dB(t), (4.61) 


then, as seen earlier, their quadratic covariation can be obtained formally by 
multiplication of dX and dY, namely 


dX Y = dX(t)dY(t) 
= ox(t)oy(t)(dB(t))? = ox (t)oy (t)dt, 


leading to the formula 
d(X(t)Y(t)) = X(d)dY(t) + Y(t)dX(t) + ox (toy (t)dt. 


Note that if one of the processes is continuous and is of finite variation, then 
the covariation term is zero. Thus for such processes the stochastic product 
rule is the same as usual. 

The integration by parts formula (4.59) can be established rigorously by 
making the argument above more precise, or by using It6’s formula for the 
function of two variables xy, or by approximations by simple processes. 
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Formula (4.57) provides an alternative representation for quadratic varia- 
tion : 
[X, X(t) = X?’ (t) — X?’ (0) — 2f X(s)dX(s). (4.62) 
0 
For Brownian motion this formula was established in Example 4.2. 
It follows from the definition of quadratic variation, that it is a non- 
decreasing process in t, and consequently it is of finite variation. It is also 


obvious from (4.62) that it is continuous. By the polarization identity, covari- 
ation is also continuous and is of finite variation. 


Example 4.21: X(t) has stochastic differential 
dX(t) = B(t)dt + tdB(t), X(0) = 0. 


We find X(t), give its distribution, its mean and covariance. X(t) = tB(t) satisfies 
the above equation, since the product rule for stochastic differentials is the same 
as usual, when one of the processes is continuous and of finite variation. Thus 
X(t) =tB(t) is Gaussian, with mean zero, and covariance function 


v(t,8) = Cov(X(t), X(s)) = E(X(t)X(s)) 
E (B(t)B(s)) = Cov (B(t)B(s)) = min(t, s). 
Example 4.22: Let Y(t) have stochastic differential 


dY (t) = SY (t)dt + Y(t)dB(t), ¥(0) =1. 


Let X(t) = tB(t). We find d(X(t)Y(t)). 


Y(t) is a Geometric Brownian motion e?) (see Example 4.17). For d(X(t)Y(t)) 
use the product rule. We need the expression for dLX, Y](t). 


dX, Y](t) = dX (t)dY (t) = (B(t)dt + taB(t)) (3Y Odt+ ¥ (t)aB(0)) 


= S BOY (i) (at)? + (Boro + 30) dB(t)dt + tY (t)(dB(t))? = tY (t)dt, 


as (dB(t))? = dt and all the other terms are zero. Thus 
d(X(t)Y(t)) = X(t)dY (t) + Y (t)dX (t) + dLX, Y](t) 
= X(t)dY(t) + Y(t)d X(t) + tY (t)dt, 
and substituting the expressions for X and Y the answer is obtained. 


Example 4.23: Let f be a C? function and B(t) Brownian motion. We find 
quadratic covariation [f(B), B](t). 
We find the answer by doing formal calculations. Using Itô’s formula 


PBH) = F (BOABE) + ESBO), 
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and the convention 
d[f (B), B\(t) = df (B(t))dB(t), 
we have 


d[f(B), B](t) = df(B(t))dB(t) = F (BONBO) +5 F"(BO))aB (Oat = f'(B(¢))dt. 


Here we used (dB)? = dt, and dBdt = 0. Thus 


- ik " f'(B(s))as 


In a more intuitive way, from the definition of the covariation, taking limits over 
shrinking partitions 


I(B), Blt) = lim $O (F(B(ER)) — I(B) (BRA) - BO) 
i=0 

_ FBE) = FBE) open T 

= ‘ee 


2 


tim JO PBE) (BEN) PE 
e 


where we have used Theorem 4.14 in the last equality. 


Example 4.24: Let f(t) be an increasing differentiable function, and let 
X(t) = B(f(t)). We show that 


[X, X](t) = [B(f), B(A = [B, BJF) = FM). (4.63) 


By taking limits over shrinking partitions 


[X,X]@é) = in 8 (41) — BOF)? 


= tim (F(tha) E) lig 
i i+1 i 


= im (fta) — f(@)) Z? = lim Tn, 


BF tiD- BEF) 


where Zi = are Standard Normal, and independent, by the prop- 
yf FOR DER) 
erties of Brownian motion, and Tn = i (f (#841) — f(t?)) Z?. Then for any n, 


n-1 


E(In) = X (Fa) — FE) = FO). 
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n—-1 
Var(TIn) = Var (= (f (tii) — fti ney) = = Su (ti+1) (t?))? , 
i=0 
by independence, and Var(Z*) = 3. The last sum converges to zero, since f is of 
finite variation and continuous, implying that 


E(t, — f(t)? 3 0: 


This means that the limit in L? of Ta is f(t), which implies that the limit in proba- 
bility of Tn is f(t), and 


It6’s Formula for Functions of Two Variables 


If two processes X and Y both possess a stochastic differential with respect 
to B(t) and f(x,y) has continuous partial derivatives up to order two, then 
f(X(t), Y (t)) also possesses a stochastic differential. To find its form consider 
formally the Taylor expansion of order two, 


df (x,y) = 1 y) da + eea) y) dy 


1 (8? f(x,y) > , Of (x,y) 2 3? f(x,y) 
+ al D re aga ew? + aT i). 


Now, (dX (t))? = dX (t)dX (t) = d[X, X](t) = 0} (X (t))dt, 

(aY (t))? = d[Y, Y]: = o? (Y (t))dt, and dX (t)dY (t) = d[X, Y]: 

= 0x(X(t))oy(Y(t))dt, where ox(t), and cy (t) are the diffusion coefficients 
of X and Y respectively. So we have 


Theorem 4.17 Let f(x,y) have continuous partial derivatives up to order 
two (a C? function) and X, Y be Ité processes, then 


ap(X(t),¥() = SEX, YOX) + IL XH YOAY (E) 
+ SEXO. YO) AO AAO YO d 
+ LAO Y Dox (X (ov (YO) (4.64) 


The proof is similar to that of Theorem 4.13, and is omitted. It is stressed that 
differential formulae have meaning only through their integral representation. 


Example 4.25: If f(x,y) = xy, then we obtain a differential of a product (or the 
product rule)which gives the integration by parts formula. 


d(X(t)Y (t)) = X(t)dY (t) + Y(t)dX(t) + ox (t)oy(t)(t)dt. 
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An important case of Itô’s formula is for functions of the form f(X (t), t). 


Theorem 4.18 Let f(x,t) be twice continuously differentiable in x, and con- 
tinuously differentiable in t (a C>”! function) and X be an Ité process, then 


of Of 1 2 O° f 
gaxo) = Exo, nax(y + L(x, tats 50% (X(0, NEXO, tat. 
(4.65) 
This formula can be obtained from Theorem 4.17 by taking Y(t) = t and 
observing that d[Y, Y] = 0 and d[X, Y] = 0. 
Example 4.26: We find stochastic differential of X(t) = e?~‘/?, 
Use Ité’s formula with f(x, t) = e**/?. X(t) = a t) satisfies 


dX(t) =df(BW),t) = oF anq t) + ma 7 


= F(B®, DAB) — SABO. tdt + EFB, tdt 
= f(B(t),t)dB(t) = X(t)dB(t). 


So that 
dX(t) = X(t)dB(t). 


4.7 It6 Processes in Higher Dimensions 


Let B(t) = (Bi(t), Bo(t),..., Ba(t)) be Brownian motion in Rf, that is, all 
coordinates B;(t) are independent one-dimensional Brownian motions. Let Fy 
be the o-field generated by B(s), s < t. Let H(t) be a regular adapted process 
d-dimensional vector process, i.e. each of its ee is such. If for each j, 
fh H TH 3(t)dt < oo, then the Itô integrals JH t)dB;(t) are defined. A single 


EA condition in terms of the length of vector | H|? = an H? is 


T 
| | E(t) |2dt < 00. 
0 


It is customary to use a scalar product notation (even suppressing -) 


d 
H(t)-dB(t) = X` H;(t)dB;(t), and 1 H(t)-dB(t =f H;(t)dB;( 


j=1 
(4.66) 
If b(t) is an integrable function then the process 


d 
= b(t)dt + X H;(t)dB;(t 
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is well defined. It is a scalar Ito process driven by a d-dimensional Brownian 
motion. More generally, we can have any number n of process driven by a 


d-dimensional Brownian motion, (the vector H; = (oi1,...cia)) 
d 
AX; (t) = bi(t)dt + X` oy (t)dBy(t), 4 =1,...,0 (4.67) 
j=l 


where o is n x d matrix valued function, B is d-dimensional Brownian motion, 
X ,b are n-dim vector-valued functions, the integrals with respect to Brownian 
motion are Ité integrals. Then X is called an Itô process. In vector form (4.67) 
becomes 
dX (t) = b(t)dt + a(t)dB. (4.68) 

The dependence of b(t) and o(t) on time t can be via the whole path of 
the process up time t, path of B,,s < t. The only restriction is that this 
dependence results in: 
for any i = 1,2,...n, b;(t) is adapted and ie |b;(t)|dt < co as. 
for any i=1,2,...n, o;;(t) is adapted and Jeo oj, (t t)dt < co a.s., which assure 
existence of the required integrals. 

An important case is when this dependence is of the form b(t) = b(X (t), t), 
a(t) = o(X(t),t). In this case the stochastic differential is written as 


dX(t) = b(X (t), t)dt + o( X(t), t)dB(t), (4.69) 


and X(t) is then a diffusion process, see Chapters 5 and 6. 

For Itô’s formula we need the quadratic variation of a multi-dimensional Itô 
processes. It is not hard to see that quadratic covariation of two independent 
Brownian motions is zero. 


Theorem 4.19 Let By(t) and B2(t) be independent Brownian motions. Then 
their covariation process exists and is identically zero. 
PROOF: Let {t}? } be a partition of [0, t] and consider 


n-1 


Tn = X (Bilt) — Bi(t?)) (Ba(tisi) — Ba(t?)). 
i=0 
Using independence of Bı and B2, E(T,,) = 0. Since increments of Brownian 


motion are independent, the variance of the sum is sum of variances, and we 
have 


Var(Tn) = JO E(Bi (tia) — Bilt?) "E (Balti) — Balt?) 


n-1 


= Soe. y < max(#?,, — ¢7)t. 
1=0 
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Thus Var(T,) = E(T?) — 0 as ôn = max;,(t",, — t?) > 0. This implies that 
Tn — 0 in probability, and the result is proved. 


Thus for k Æ l, k,l =1,2,...d, 
[Br, Bi](t) = 0. (4.70) 
Using (4.70), and the bi-linearity of covariation, it is easy to see from (4.67) 
d[.X;, X; (t) = dX: (t)dX; (t) = aijdt, for i,j =1,...n. (4.71) 
where a, called the diffusion matrix, is given by 
a= 00", (4.72) 


with a7” denoting the transposed matrix of ø. 


Itô’s Formula for Functions of Several Variables 


If X(t) = (X1(t), X2(t),..., Xn(t)) is a vector Itô process and f(x£1, £2,..., En) 
is a C? function of n variables, then f(X1(t), X2(t),...,Xn(t)) is also an Itô 
process, moreover its stochastic differential is given by 


df (X(t), Xo(t),..., Xn(t)) 


>> Fae Slt), X00, -<s Xn (t) dX, Xat). (4.73) 


When there is only one Brownian motion, d = 1, this formula is a generaliza- 
tion of Itô’s formula for a function of two variables (Theorem 4.17). 

For examples and applications see multi-dimensional diffusions in Chapters 
5 and 6. We comment here on the integration by parts formula. 


Remark 4.8: (Integration by Parts) 

Let X(t) and Y(t) be two Itô processes that are adapted to independent Brow- 
nian motions Bı and Bə. Take f(x,y) = xy and note that only one of the 
second derivatives is different from zero, Zu, but then the term it multiplies 
is zero, d|B1, B2|(t) = 0 by Theorem 4.19. So the covariation of X (t) and Y (t) 


is zero, and one obtains from (4.73) 
d(X (t)Y(t)) = X(t)dY (t) +Y (t)d X(t), (4.74) 


which is the usual integration by parts formula. 
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Remark 4.9: In some applications correlated Brownian motions are used. 
These are obtained by a linear transformation of independent Brownian mo- 
tions. If Bı and Bz W are independent, then the pair of processes Bı and 
W = pBı+ y1 — p? Bə are correlated Brownian motions. It is easy to see that 
W is indeed a Brownian motion, and that d[B,,W](t) = pdt. 


More results about It6 processes in higher dimensions are given in Chapter 6. 


Remark 4.10: Itô’s formula can be generalized to functions less smooth than 
C?, in particular for f(x) = |x|. It6’s formula for f(x) = |x| becomes Tanaka’s 
fornil, and leads to the concept of local time. This development requires 
additional concepts, which are given later, see Section 8.7 in the general theory 
for semimartingales. 


Notes. Material in this chapter can be found in Gihman and Skorohod (1972), 
Liptser and Shiryaev (1977), (1989), Karatzas and Shreve (1988), Gard (1988), 
Rogers and Williams (1990), (1994). 


4.8 Exercises 


Exercise 4.1: Give values of a for which the following process is defined 
= le (t—s)~°dB(s). (This process is used in the definition of the so-called 
Fractional Brownian motion.) 


Exercise 4.2: Show that if X is a simple bounded adapted process, then 
t : : 
Jo X(s)dB(s) is continuous. 


Exercise 4.3: Let Xn be a Gaussian sequence convergent in distribution to 
X. Show that the distribution of X is either Normal or degenerate. Deduce 
that if EX, —> u and Var(X,) > o? > 0 then the limit is N(,07). Since 
convergence in probability implies convergence in distribution, deduce conver- 
gence of It6 integrals of simple non-random processes to a Gaussian limit. 


Exercise 4.4: Show that if X(t) is non-random (does not depend on B(t)) 
and is a function of t and s with h X?(t, s)ds < œ then ie X(t, s)dB(s) isa 
Gaussian random variable Y (t). The collection Y(t), 0 < t < T, is a Gaussian 
process with zero mean age Nn function for u > 0 given by 

Cov(Y (t), Y(t + u)) Sx X (t+ u, s)ds. 


Exercise 4.5: Show that a Gaussian martingale on a finite time interval 
(0, T] is a square integrable martingale with E a a. 
that if X is non-random and ts X?(s)ds < oo then Y(t es )isa 
Gaussian square integrable Pa with a eee 
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Exercise 4.6: Obtain the alternative relation for the quadratic variation of 
It6 processes, Equation (4.62), by applying It6’s formula to X?(t). 


Exercise 4.7: X(t) has a stochastic differential with u(x) = bx + c and 
o?(x) =4a. Assuming X(t) > 0, find the stochastic differential for the process 
Y(t) = /X(t). 


Exercise 4.8: A process X(t) on (0,1) has a stochastic differential with 
coefficient o(x) = a(1 — x). Assuming 0 < X(t) < 1, show that the process 
defined by Y(t) = In(X(t)/(1 — X (t))) has a constant diffusion coefficient. 


Exercise 4.9: X(t) has a stochastic differential with u(x) = cx and o?(x) = 
x*,c>0. Let Y(t) = X(t)’. What choice of b will give a constant diffusion 
coefficient for Y? 


Exercise 4.10: Let X(t) = tB(t) and Y(t) = eP ®©. Find d ($2). 


< 


Exercise 4.11: Obtain the differential of a ratio formula d ($2) by taking 
f(x,y) =x/y. Assume that the process Y stays away from 0. 


Exercise 4.12: Find d(M(t))’, where M(t) = eB()—-*/2 


Exercise 4.13: Let M(t) = B°(t) — 3tB(t). Show that M is a martingale, 
first directly and then by using It6 integrals. 


Exercise 4.14: Show that M(t) = e'/?sin(B(t)) is a martingale by using 
Itô’s formula. 


Exercise 4.15: For a function of n variables and n-dimensional Brownian 
motion, write It6’s formula for f(Bi(t),...,B,(t)) by using gradient notation 


Vf =(gh,---+ 5%). 


Exercise 4.16: (x) is the standard Normal distribution function. Show that 
for a fixed T > 0 the process a(S), 0<t<T isa martingale. 


Exercise 4.17: Let X(t) = (1 — t) fj #22, where 0 < t < 1. Find dX(t). 


Exercise 4.18: Let X(t) = tB(t). Find its quadratic variation [X, X](¢). 


Exercise 4.19: Let X(t) = hE — s)dB(s). Find dX(t) and its quadratic 
variation [X, X] (t). Compare to the quadratic variation of Itô integrals. 
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Chapter 5 


Stochastic Differential 
Equations 


Differential equations are used to describe the evolution of asystem. Stochastic 
Differential Equations (SDEs) arise when a random noise is introduced into 
ordinary differential equations (ODEs). In this chapter we define two concepts 
of solutions of SDEs, the strong and the weak solution. 


5.1 Definition of Stochastic Differential Equa- 
tions 


Ordinary Differential Equations 


If x(t) is a differentiable function defined for t > 0, u(x,t) is a function of z, 
and t, and the following relation is satisfied for all t, 0 < t < T 


dax(t) 


n x'(t) = (x(t), t), and z(0) = xo, (5.1) 


then z(t) is a solution of the ODE with the initial condition zo. Usually the 
requirement that x’(t) is continuous is added. See also Theorem 1.4. 
The above equation can be written in other forms. 


and (by continuity of x’(t)) 
x(t) = z(0) +f u(z(s), s)ds. 
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Before we give a rigorous definition of SDEs, we show how they arise as a 
randomly perturbed ODEs and give a physical interpretation. 


White Noise and SDEs 


The White Noise process €(t) is formally defined as the derivative of the Brow- 
nian motion, 


ee) = ZU -= pw. (5.2) 


It does not exist as a function of t in the usual sense, since a Brownian motion 
is nowhere differentiable. 

If a(x, t) is the intensity of the noise at point x at time t, then it is agreed 
that f o(X(t),t)€(t)dt = fo o( X(t), t)B'(t)dt = J) o( X(t), t)dB(t), where 
the integral is Ito integral. 

Stochastic Differential Equations arise, for example, when the coefficients 
of ordinary equations are perturbed by White Noise. 


Example 5.1: Black-Scholes-Merton model for growth with uncertain rate of return. 
x(t) is the value of $1 after time t, invested in a savings account. By the definition 
of compound interest, it satisfies the ODE dx(t)/x(t) = rdt, or dx(t)/dt = ra(t), (r 
is called the interest rate). If the rate is uncertain, it is taken to be perturbed by 
noise, r + E(t), and following SDE is obtained 


dX(t) 
dt 


= (r + o€(t)) X(t), 


meaning 

dX(t) =rX(t)dt + oX(t)dB(t). 
Case ø = 0 corresponds to no noise, and recovers the deterministic equation. The 
solution of the deterministic equation is easily obtained by separating variables as 
x(t) =e". The solution to the above SDE is given by a geometric Brownian motion, 
as can be verified by Itd’s formula (see Example 5.5) 


Ket =e He Be, (5.3) 


Example 5.2: Population growth. If x(t) denotes the population density, then the 
population growth can be described by the ODE dz(t)/dt = ax(t)(1 — x(t)). The 
growth is exponential with birth rate a, when this density is small, and slows down 
when the density increases. Random perturbation of the birth rate results in the 
equation dX (t)/dt = (a + o€(t))X(t)(1 — X(t)), or the SDE 


dX (t) = aX(t)(1 — X(t))dt + oX(t)(1 — X(t))dB(t). 
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A Physical Model of Diffusion and SDEs 


The physical phenomena which give rise to the mathematical model of diffusion 
(and of Brownian motion) is the microscopic motion of a particle suspended 
in a fluid. Molecules of the fluid move with various velocities and collide with 
the particle from every possible direction producing a constant bombardment. 
As a result of this bombardment the particle exhibits an ever present erratic 
movement. This movement intensifies with increase in the temperature of the 
fluid. Denote by X(t) the displacement of the particle in one direction from its 
initial position at time t. If o(a,t) measures the effect of temperature at point 
x at time t, then the displacement due to bombardment during time |t, t+ A] 
is modelled as o(2,t)(B(t + A) — B(t)). If the velocity of the fluid at point x 
at time t is u(x,t), then the displacement of the particle due to the movement 
of the fluid during is u(x,t)A. Thus the total displacement from its position 
x at time t is given by 


X(t+ A) —2 pla, t)A + olz, t) (BE +A) 2 B(t)). (5.4) 


Thus we obtain from this equation, the mean displacement from x during short 
time A is given by 


B((Xtt +A) — X(t)|X(t) = x) x u(x,t): A, (5.5) 


and the second moment of the displacement from x during short time A is 
given by 
B((xXtt +A) —X(t))?|X() = x) x o2(x,t)A. (5.6) 


The above relations show that for small intervals of time both the mean and 
the second moment (and variance) of the displacement of a diffusing particle at 
time t at point x are proportional to the length of the interval, with coefficients 
u(x,t) and o?(x,t) respectively. 

It can be shown that, taken as asymptotic relations as A — 0, that is, 
replacing ~ sign by the equality and adding terms o(A) to the right-hand 
sides, these two requirements characterize diffusion processes. 

Assuming that u(x,t) and o(x,t) are smooth functions, heuristic Equation 
(5.4) also points out that for small intervals of time A, diffusions are approxi- 
mately Gaussian processes. Given X(t) = x, X(t+A)— X(t) is approximately 
normally distributed, N (u(x, t)A, o° &, t)A). Of course, for large intervals of 
time diffusions are not Gaussian, unless the coefficients are non-random. 

A stochastic differential equation is obtained heuristically from the relation 
(5.4) by replacing A by dt, and AB = B(t + A) — B(t) by dB(t), and 
X(t+ A) — X(t) by dX(t). 
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Stochastic Differential Equations 


Let B(t), t > 0, be Brownian motion process. An equation of the form 
dX(t) = (X(t), t)dt + o( X(t), t)dB(t), (5.7) 


where functions u(x,t) and o(x,t) are given and X(t) is the unknown process, 
is called a stochastic differential equation (SDE) driven by Brownian motion. 
The functions u(x,t) and o(«,t) are called the coefficients. 


Definition 5.1 A process X(t) is called a strong solution of the SDE (5.7) if 
for all t > 0 the integrals Je p(X (s), s)ds and ii o(X(s),s)dB(s) exist, with 
the second being an Itô integral, and 


x(n = x0o)+ | uX), sds + f o(X(s),s)dB(s). (5.8) 


Remark 5.1: 

1. A strong solution is some function (functional) F(t,(B(s),s < t)) of the 
given Brownian motion B(t). 

2. When o = 0, the SDE becomes an ordinary differential equation (ODE). 
3. Another interpretation of (5.7), called the weak solution, is a solution in 
distribution which will be given later. 


Equations of the form (5.7) are called diffusion-type SDEs. More general 
SDEs have the form 
dX(t) = u(t)dt + o(t)dB(t), (5.9) 


where p(t) and o(t) can depend on t and the whole past of the processes 
X(t) and B(t) (X(s), B(s),s < t), that is, u(t) = u((X(s),s < t),t), o(t) = 
a((X(s),s < t),t). The only restriction on u(t) and o(t) is that they must be 
adapted processes, with respective integrals defined. Although many results 
(such as existence and uniqueness results) can be formulated for general SDEs, 
we concentrate here on diffusion-type SDEs. 


Example 5.3: We have seen that X(t) = exp(B(t) — t/2) is a solution of the 
stochastic exponential SDE dX(t) = X(t)dB(t), X(0) =1. 


Example 5.4: Consider the process X(t) satisfying dX(t) = a(t)dB(t), where 
a(t) is a non-random. Clearly, X(t) = X(0) + J a(s)dB(s). We can represent 
this as a function of the Brownian motion by integrating by parts, X(t) = X(0) + 
a(t) B(t) — : B(s)a’(s)ds, assuming a(t) is differentiable. In this case the function 
F(t, (a(s),s < t)) = X(0) + a(t)ax(t) — J x(s)a'(s)ds. 


The next two examples demonstrate how to find a strong solution by using 
Itô’s formula and integration by parts. 
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Example 5.5: (Example 5.1 continued) 
Consider the SDE 


dX(t) = wX(t)dt+oX(t)dB(t), X(0) =1. (5.10) 


Take f(x) = Ina, then f'(x) = 1/x and f" (x) = —1/2”. 


dn X(t)) = ep + 5( - yp) Od 
1 


= Xp (wx at + oX(t)dB(t)) -ladt 


= (u— 5o7)at + odB(t) 
So that Y (t) = In X(t) satisfies 
dY (t) = (u — Lo’)dt + cdB(t). 
Its integral representation gives 
V(t) =Y (0) + (u— $0°)t + oB(), 


and i 
X(t) = X(0)e 727 FBO, (5.11) 


Example 5.6: (Langevin equation and Ornstein-Uhlenbeck process) 
Consider the SDE 

dX(t) = —aX (t)dt + odB(t), (5.12) 
where a and o are some non-negative constants. 

Note that in the case ø = 0, the solution to the ODE is xoe~°", or in other 
words x(t)e™ is a constant. To solve the SDE consider the process Y(t) = X (t)e*. 
Use the differential of the product rule, and note that the covariation of e®’ with 
X(t) is zero, as it is a differentiable function (d(e®’)dX (t) = ae dtdX (t) = 0), we 
have dY (t) = edX(t) + ae% X(t)dt. Using the SDE for dX(t) we obtain dY (t) = 
oe“'dB(t). This gives Y(t) = Y (0) + J oe®%™°dB(s). Now the solution for X (t) is 


x= xO f oe“*dB(s)). (5.13) 
0 


The process X(t) in (5.12) is known as the Ornstein-Uhlenbeck process. 

We can also find the functional dependence of the solution on the Brownian 
motion path. Performing integration by parts, we find the function giving the strong 
solution 


X(t) = F(t, (B(s),0 < s < t)) =e" X(0) +o B(t) — oa [ e 5) B(s)ds. 
0 
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A more general equation is 
dX(t) = (8 — aX (t))dt + odB(t), (5.14) 


with the solution 
8 t 
X(t) =—+ e * (xo E +f ve*"dB)) : (5.15) 
o 


Using Itô’s formula it is easy to verify that (5.15) is indeed a solution. 


Example 5.7: Consider the SDE dX(t) = B(t)dB(t). 
Clearly, X(t) = X(0) + J B(s)dB(s), and using integration by parts (or Itô’s for- 
mula), we obtain 


X(t) = X(0) + (B? (t) — t). 


Remark 5.2: If a strong solution exists, then by definition it is adapted 
to the filtration of the given Brownian motion, and as such it is intuitively 
clear that it is a function the path (B(s),s < t). Results of Yamada and 
Watanabe (1971), and Kallenberg (1996) state that provided the conditions of 
the existence and uniqueness theorem are satisfied, then there exists a function 
F such that the strong solution is given by X(t) = F(t, (B(s),s < t)). Not 
much is known about F in general. Often it is not easy to find this function 
even for Itô integrals X(t) = fy f(B(s))dB(s), e.g. X(t) = Jo |B(s)|!/2dB(s). 
For a representation of Itô integrals as functions of Brownian motion paths, 
see, for example, Rogers and Williams (1990), p.125-127. 


Only some classes of SDEs admit a closed form solution. When a closed 
form solution is hard to find, an existence and uniqueness result is important, 
because without it, it is not clear what exactly the equation means. When 
a solution exists and is unique, then numerical methods can be employed to 
compute it. Similarly to ordinary differential equations, linear SDEs can be 
solved explicitly. 


5.2 Stochastic Exponential and Logarithm 
Let X have a stochastic differential, and U satisfy 
dU (t) = U(t)dX(t), and U(0) =1, or U(t) =1+ ii U(s)dX(s). (5.16) 
0 


Then U is called the stochastic exponential of X, and is denoted by E(X). If 
X(t) is of finite variation then the solution to (5.16) is given by U(t) = e*™. 
For It6 processes the solution is given by 
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Theorem 5.2 The only solution of (5.16) is given by 

U(t) = E(X)(t) = eX O-XO- FIX AXI), (5.17) 
PROOF: The proof of existence of a solution to (5.16) consists of verification, 


by using Itô’s formula, of (5.17). Write U(t) = eV, with V(t) = X(t) — 
X (0) — $[X, X](t). Then 


aE (X)(t) = AUG) = d(eY) = av (t) + Sealy, VIC). 


Since [X, X](t) is of finite variation, and X(t) is continuous, [X, |X, X]](t) = 0, 
and [V,V](t) = [X, X](¢). Using this with the expression for V(t), we obtain 


1 1 
d€(X)(t) = eV dX (t) — 50 dx, X\(t) + 50 dx, X(t) = eV dX (t), 
and (5.16) is established. The proof of uniqueness is done by assuming that 


there is another process satisfying (5.16), say U1 (t), and showing by integration 
by parts that d(Ui(t)/U(t)) = 0. It is left as an exercise. 


Note that unlike in the case of the usual exponential g(t) = exp(f)(t) = ef™, 
the stochastic exponential E(X) requires the knowledge of all the values of the 
process up to time t, since it involves the quadratic variation term [X, X](t). 


Example 5.8: Stochastic exponential of Brownian motion B(t) is given by U(t) = 
E(B)(t) = e232", and it satisfies for all t, dU(t) = U(t)dB(t) with U(0) = 1. 


Example 5.9: Application in Finance: Stock process and its Return process. 

Let S(t) denote the price of stock and assume that it is an It6 process, i.e. it has 
a stochastic differential. The process of the return on stock R(t) is defined by the 
relation 


dR(t) = ea 
In other words 
dS(t) = S(t)dR(t) (5.18) 


and the stock price is the stochastic exponential of the return. Returns are usually 
easier to model from first principles. For example, in the Black-Scholes model it 
is assumed that the returns over non-overlapping time intervals are independent, 
and have finite variance. This assumption leads to the model for the return process 
R(t) = ut + oB(t). The stock price is then given by 


S(t) = S(O)E(R): = Soe®-2O- FRA 
= S(O)e— 37 eB), wi 
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Stochastic Logarithm 

If U = E(X), then the process X is called the stochastic logarithm of U, 
denoted £(U). This is the inverse operation to the stochastic exponential. 
For example, the stochastic exponential of Brownian motion B(t) is given by 
eB()-3t, So B(t) is the stochastic logarithm of e? 2", 

Theorem 5.3 Let U have a stochastic differential and not take value 0. Then 
the stochastic logarithm of U satisfies the SDE 


dU (t) 


aX() = Fay XO =0, (5.20) 
X(t) = L(U)(t) =n (2) +f oa (5.21) 


PROOF: The SDE for the stochastic logarithm £(U) is by the definition of 
E(X). The solution (5.21) and uniqueness are obtained by It6’s formula. 


Example 5.10: Let U(t) = e?™. We find its stochastic logarithm L(U) directly 
and then verify (5.21). dU (t) = e? dB(t) + 1P dt. Hence 
dU (t) 


dX(t) = LU) = TAP = BO 4 sit 


Thus 


Now, d[U, U](t) = dU(t)dU(t) = e?8 dt, so that 
t 2B(t) t 
e dt 1 1 
cu =mue+ f Sam = Bt) + | gat = Bit) + 5t, 
which verifies (5.21). 


Remark 5.3: The stochastic Logarithm is useful in financial applications (see 
Kallsen and Shiryaev (2002)). 


5.3 Solutions to Linear SDEs 


Linear SDEs form a class of SDEs that can be solved explicitly. Consider 
general linear SDE in one dimension 


dX(t) = (a(t) + B(t)X(t)) dt + (Y(t) + 6(t) X(t)) dB(t), (5.22) 


where functions a, 3, y, ô are given adapted processes, and are continuous func- 
tions of t. Examples considered in the previous section are particular cases of 
linear SDEs. 
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Stochastic Exponential SDEs 


Consider finding solutions in the case when a(t) = 0 and y(t) = 0. The SDE 
becomes 
dU (t) = B(t)U(t)dt + d(t)U (t)dB(t). (5.23) 
This SDE is of the form 
dU (t) = U(t)dY (t), (5.24) 


where the Itô process Y(t) is defined by 
dy (t) = B(t)dt + 6(t)dB(t). 


The SDE (5.23) is the stochastic exponential of Y, see Section (5.2). The 
stochastic exponential of Y is given by 


U) = EY) 
= U(0)exp (xe -Y(0)- iy, Y](t )) 


= ven ( | 8 yas+ [ats )dB(s j- | % )ds) 


= BO) exp G (A(s) — ses )) ds pas fa 5(s)dB(s)), (5.25) 
where [Y, Y](t) is obtained from calculations d[Y, Y](t) = dY (t)dY (t) = 6?(t)dt. 


General Linear SDEs 


To find a solution for the Equation (5.22) in the general case, look for a solution 
of the form 


X(t) = U(t)V(t), (5.26) 
where 
dU(t) = B(t)U(t)dt + 8(t)U (t)dB(t), (5.27) 
nd 
i dV (t) = a(t)dt + b(t)dB(t). (5.28) 


Set U(0) = 1 and V(0) = X (0). Note that U is given by (5.25). Taking 
the differential of the product it is easy to see that we can choose coefficients 
a(t) and b(t) in such a way that relation X(t) = U (t)V (t) holds. The desired 
coefficients a(t) and b(t) turn out to satisfy equations 

b(t)U(t) = y(t), and a(t)U(t) = a(t) — o(t)y(t). (5.29) 
Using the expression for U(t), a(t) and b(t) are then determined. Thus V(t) 
is obtained, and X(t) is found to be 


xy =00(x00+ | ass f Waso). (5.30) 
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Langevin type SDE 


Let X(t) satisfy 
dX (t) = a(t) X (t)dt + dB(t), (5.31) 


where a(t) is a given adapted and continuous process. When a(t) = —a the 
equation is the Langevin equation, Example 5.6. 

We solve the SDE in two ways: by using the formula (5.30), and directly, 
similarly to Langevin’s SDE. 

Clearly, (t) = a(t), y(t) = 1, and a(t) = 6(t) = 0. To find U(t), we must 


solve dU/(t) = a(#)U(e)dt, which gives U7(#) = efo °°)". Thus from (5.30) 
X(t) = edo "94 (x(0) + fé e7 So "aR (u)). 
Consider the process e J alade ge (t) and use integration by parts. The 


t 
- ds . ; . ‘ wat . 
process e Í «(848 iS continuous and is of finite variation. Therefore it has 
zero covariation with X (t), hence 


a(eheorx(y) = oso 4 ax (4) — ate fo 24" x (eat 
= ehe a (4s a BE (t). 


Integrating we obtain 
t u 
h a (s)ds y = X (0) +f sT J ads IB (u) 
0 


and finally 
t t t u 
X(t) = X Oeo alojada i A oe) e i «(ds 7 B(u). (5.32) 
0 


Brownian Bridge 


Brownian Bridge, or pinned Brownian motion is a solution to the following 
SDE 


b- X(t 


dX (t) = 
This process is a transformed Brownian motion with fixed values at each end 
of the interval [0,7], X(0) = a and X(T) = b. The above SDE is a linear 
SDE, with 
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Identifying U(t) and V(t) in (5.30) we obtain 
1 


— Ss 


t t t 
X(t)=a(1 m PET Df a 
Since the function under the It6 integral is deterministic, and for any t < T, 
is ds/(T — s)? < œœ, the process I 7+dB(s) is a martingale, moreover, it is 
Gaussian, by Theorem 4.11. Thus X(t), on [0, T) is a Gaussian process with 
initial value X(0) = a. The value at T, which is X(T) = b, is determined by 
continuity, see Example 5.11 below. 
Thus a Brownian Bridge is a continuous Gaussian process on [0,7] with 
mean function a(1 — t/T) + bt/T and covariance function 
Cov(X(t), X(s)) = min(s, t) — st/T. 


dB(s), forO<t<T. (5.34) 


T-s 
Using integration by parts, (which is the same as the standard formula due to zero 
covariation between the deterministic term and Brownian motion), for any t < T 


t i E ' B(s) 
J ra ARSO fara 


0 


Example 5.11: We show that lim:;7(T — t) f 1_dB(s) = 0 almost surely. 


and 7 f 
1 B(s) 
T-t ——dB(s) = B(t)— (T-t ds. 
ro f pH BO=BO-0-9| gepe 63) 
It is an exercise in calculus (by changing variables u = 1/(t — s), or considering 


integrals f os and fe 5) to see that for any continuous function g(s), 


lim(T — t) ip a ds = g(T). 
0 


tT 


Applying this with g(s) = B(s) shows that the limit in (5.35) is zero. 


5.4 Existence and Uniqueness of Strong Solu- 
tions 


Let X(t) satisfy 
dX(t) = u(X (t), t)dt + o( X(t), t)dB (t). (5.36) 


Theorem 5.4 (Existence and Uniqueness) If the following conditions are 
satisfied 


1. Coefficients are locally Lipschitz in x uniformly in t, that is, 
for every T and N, there is a constant K depending only on T and N 
such that for all |x|, |y| < N and all0<t<T 


u(x,t) — uly, t)| + [a(x t) — oly, t)| < Kle = yl, (5.37) 
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2. Coefficients satisfy the linear growth condition 
|u(2, t)| + |o(x,t)| < K(1 + |z) (5.38) 


3. X(0) is independent of (B(t),0 < t < T), and EX?(0) < œ. 


Then there exists a unique strong solution X(t) of the SDE (5.36). X(t) has 
continuous paths, moreover 


E( sup X”) < C(1 + E(X?(0))), (5.39) 


where constant C depends only on K and T. 


The proof of existence is carried out by successive approximations, similar to 
that for ordinary differential equations (Picard iterations). It can be found 
in Friedman (1975), p.104-107, Gihman and Skorohod (1982), Rogers and 
Williams (1990). It is not hard to see, by using Gronwall’s lemma, that the 
Lipschitz condition implies uniqueness. 

The Lipschitz condition (5.37) holds if, for example, partial derivatives 
g(t, x) and St, x) are bounded for |æ|, |y] < N and all 0 < t < T, which in 
turn is true if the derivatives are continuous (see Chapter 1). 


Less Stringent Conditions for Strong Solutions 


The next result is specific for one-dimensional SDEs. It is given for the case of 
time-independent coefficients. A similar result holds for time-dependent coef- 
ficients, see for example, Ethier and Kurtz (1986), p.298, Rogers and Williams 
(1990), p.265. 


Theorem 5.5 (Yamada-Watanabe) Suppose that u(x) satisfies the Lips- 
chitz condition and o(x) satisfies a Holder condition condition of order a, 
a> 1/2, that is, there is a constant K such that 


|o(x) — o(y)| < Kļz — y|“. (5.40) 
Then the strong solution exists and is unique. 


Example 5.12: (Girsanov’s SDE) 

dX(t) = |X(t)|"dB(t), X(0) = 0, 1/2 < r < 1. Note that for such r, |x|" is Hölder, 
but not Lipschitz (see section on these conditions in Chapter 1). X(t) = 0 isa 
strong solution. Since the conditions of the Theorem are satisfied, X(t) = 0 is the 
only solution. 
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5.5 Markov Property of Solutions 


The Markov property asserts that given the present state of the process, the 
future is independent of the past. This can be stated as follows. If F, denotes 
the o-field generated by the process up to time t, then for anyO<s<t 


P(X(t) < ylFs) = P(X(t) < ylX(s)) as. (5.41) 


It is intuitively clear from the heuristic Equation (5.4) that solutions to SDEs 
should have the Markov Property. For a small A, given X(t) = z, X(t +A) 
depends on B(t + A) — B(t), which is independent of the past. 

We don’t prove that strong solutions possess Markov property. However, 
by the construction of the solution on the canonical space (a weak solution), 
it can be seen that the Markov property holds. 


Transition Function. 


Markov processes are characterized by the transition probability function. De- 
note by 
Ply, t,a,8) =P(X(t) < y|X(s) = 2) (5.42) 


the conditional distribution function of the random variable X(t) given that 
X(s) = x, ie. the distribution of the values at time t given that the process 
was in the state x at time s. 


Theorem 5.6 Let X(t) be a solution to the SDE (5.36) Then X(t) has the 
Markov property. 


Using the law of total probability, by conditioning on all possible values z of 
the process at time u, for s < u < t, we obtain that the transition probability 
function P(y,t,x,s) in (5.42) satisfies the Chapman-Kolmogorov equation 


P(y,t,x,s) = P(y,t, z,u)P(dz,u,x,s), for any s<u<t. 5.43 
y 


—oco 


In fact any function that satisfies this equation and is a distribution func- 
tion in y for fixed values of the other arguments, is a transition function of 
some Markov process. 


(uaa)? 
Example 5.13: If P(y,t, 2, s) ifs naa 20=s) du is the cumulative distribu- 


tion function of the Normal N(x,t—s) distribution, then the corresponding diffusion 
process is Brownian motion. Indeed, P(B(t) < y|Fs) = P(B(t) < y|B(s)), and the 
conditional distribution of B(t) given B(s) = ax is N(x,t — s). 
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Example 5.14: Let X(t) solve SDE dX(t) = uX(t)dt + oX(t)dB(t) for some 
constants u and o. We know (see Example 5.5) that X(t) = X (0jeH72?/Dt+ BO 
Hence X(t) = X(s)e™ o*/2)(t—s)+o(B()—B(s)) and its transition probability func- 
tion P(X(t) < ylX(s) = 2) = P(X (s)e®70?/DE-9+(B0-B6) < y|X(s) = z) = 
P(re 27 /ay(t—s)+o(B()—B(3)) < y|X(s) = x). Using independence of B(t) — B(s) 
and X(s), the conditional probability is given by P(ae“ a? /2)(t—8)+o(B(t)—B(s)) <y) 
) 


= P(el#-27/2)(t-s)+0(B()—B(s)) < y/z). Thus P(y,t,2, 8) = (mule Wa al), 


Remark 5.4: Introduce a useful representation, which requires us to keep 
track of when and where the process starts. Denote by X(t) the value of 
the process at time t when it starts at time s from the point x. It is clear 
that for 0 < s < t, X(t) = xžë (t). The Markov property states that 
conditionally on X?°(t) = x, the processes X7°(u), s < u < t, and X} (u), 
t < u are independent. 


Definition 5.7 A process has the strong Markov property if the relation (5.41) 
holds when a non-random time s is replaced by a finite stopping time T. 


Solutions to SDEs have also the strong Markov property, meaning that 
given the history up to a stopping time 7, the behaviour of the process at 
some future time t, is independent of the past. See also Section 3.4. 


If an SDE has a strong solution X(t), then X(t) has a transition probability 
function P(y,t, x, s). This function can be found as a solution to the Forward 
or the Backward partial differential equations, see Section 5.8. 

A transition probability function P(y, t, x, s) may exist for SDEs without a 
strong solution. This function in turn determines a Markov process uniquely 
(all finite-dimensional distributions). This process is known as a weak solution 
to an SDE. In this way one can define a solution for an SDE under less stringent 
conditions on its coefficients. The concepts of the weak solution are considered 
next. 


5.6 Weak Solutions to SDEs 


The concept of weak solutions allows us to give a meaning to an SDE when 
strong solutions do not exist. Weak solutions are solutions in distribution, 
they can be realized (defined) on some other probability space and exist under 
less stringent conditions on the coefficients of the SDE. 


Definition 5.8 If there exist a probability space with a filtration, a Brownian 
motion B(t) and a process X(t) adapted to that filtration, such that: X (0) 
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has the given distribution, for all t the integrals below are defined, and X(t) 
satisfies 


X(t) = X(0) + i) u( (u), udu + [ o(X(u), u)dB(u), (5.44) 


then X(t) is called a weak solution to the SDE 
dX(t) = (X(t), t)dt + o( X(t), t)dB(t). (5.45) 


Definition 5.9 A weak solution is called unique if whenever X(t) and X'(t) 
are two solutions (perhaps on different probability spaces) such that the distri- 
butions of X(0) and X'(0) are the same, then all finite-dimensional distribu- 
tions of X(t) and X'(t) are the same. 


Clearly, by definition, a strong solution is also a weak solution. Uniqueness 
of the strong solution (pathwise uniqueness) implies uniqueness of the weak 
solution, (a result of Yamada and Watanabe (1971)). In the next example a 
strong solution does not exist, but a weak solution exists and is unique. 


Example 5.15: (Tanaka’s SDE) 


dX(t) = sign(X (t))dB(t), (5.46) 
where 
: 1 if «>0 
igale) = es a ee À: 


Since o(x) = sign (x) is discontinuous, it is not Lipschitz, and conditions for the 
strong existence fail. It can be shown that a strong solution to Tanaka’s SDE does 
not exist, for example, Gihman and Skorohod (1982), Rogers and Williams (1990) 
p.151. We show that the Brownian motion is the unique weak solution of Tanaka’s 
SDE. Let X(t) be some Brownian motion. Consider the process 


t t 
1 
Y(t) = i ~~ dX (s) = f sign(X(s))dX (s). 
5 sign(X (s)) ò 
sign(X(t)) is adapted, J (sign (X (t)))?dt = T < 00, and Y (t) is well defined and is 
a continuous martingale. 


m= f sig XXX) = | ds =t. 
0 0 


By Levy’s theorem (which is proven later), Y(t) is a Brownian motion, call it B(t), 


ann fË dXx(s) 
B= | E 


Rewrite the last equality in the differential notation to obtain Tanaka’s SDE. Levy’s 
characterization theorem implies also that any weak solution is a Brownian motion. 
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Example 5.16: (Girsanov’s SDE) The equation 
dX(t) = |X(t)|"dB(O), (5.47) 


r > 0, t > 0 has a strong solution X(t) = 0. For r > 1/2, this is the only strong 
solution by Theorem 5.5. Therefore the are no weak solutions other than zero. For 
0 < r < 1/2 the SDE has infinitely many weak solutions (Rogers and Williams 
(1990) p.175). Therefore there is no strong uniqueness in this case, otherwise it 
would have only one weak solution. Compare this to the non-uniqueness of solution 
of the equation dx(t) = 2,/|a(t)|dt, which has solutions z(t) = 0 and z(t) = ¢?. 


5.7 Construction of Weak Solutions 


In this section we give results on the existence and uniqueness of weak solutions 
to SDEs. Construction of weak solutions requires more advanced knowledge, 
and this section can be skipped. 


Theorem 5.10 Jf for each t > 0, functions u(x,t) and o(x,t) are bounded 
and continuous then the SDE (5.45) has at least one weak solution starting at 
time s at point x, for all s, and x. If in addition their partial derivatives with 
respect to x up to order two are also bounded and continuous, then the SDE 
(5.45) has a unique weak solution starting at time s at point x. Moreover this 
solution has the strong Markov property. 


These results are proved in Stroock and Varadhan (1979), ch. 6. But better 
conditions are available, Stroock and Varadhan (1979) Corollary 6.5.5, see also 
Pinsky (1995) Theorem 1.10.2. 


Theorem 5.11 Jf o(x,t) is positive and continuous and for any T > 0 there 
is Kr such that for alae R 


|u(a, t)| + |o(a,t)| < Kr(1 + Ia) (5.48) 


then there exists a unique weak solution to SDE (5.45) starting at any point 
x € R at any time s > 0, moreover it has the strong Markov property. 


Canonical Space for Diffusions 


Solutions to SDEs or diffusions can be realized on the probability space of 
continuous functions. We indicate: how to define probability on this space by 
means of a transition function, how to find the transition function from the 
given SDE and how to verify that the constructed process indeed satisfies the 
given SDE. 
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Probability Space (Q, F, F) 


Weak solutions can be constructed on the canonical space Q = C (l0, o0)) of 
continuous functions from [0,00) to R. The Borel o-field on Q is the one 
generated by the open sets. Open sets in turn, are defined with the help 
of a metric, for example, an open ball of radius € centered at w is the set 
Delw) = {w : d(w,w’) < e}. The distance between two continuous functions 
w 1 and wz is taken as 


SUPo<t<n w(t) — w2(t)| 


"1+ supoce<n [wi (t) = wD 


= 
E 
§ 
II 
Ms 
i 


Convergence of the elements of Q in this metric is the uniform convergence 
of functions on bounded closed intervals [0,7]. Diffusions on a finite interval 
(0, T] can be realized on the space C([0, T]) with the metric 


d(wı, w2) = sup |wi(t) — w2(t)l. 
0<t<T 
The canonical process X(t) is defined by X(t,w) = w(t), 0 < t < œ. It 
is known (for example, Dudley (1989) p.356) that the Borel o-field F on 
C([0,00)) is given by o(X(t),0 < t < ov). The filtration is defined by the 
o-fields F; = 0 (X(s),0 < s < t). 


Probability Measure 


We outline the construction of probability measures from a given transition 
function P(y, t,x, s). In particular, this construction gives the Wiener measure 
that corresponds to the Brownian motion process. 

For any fixed x € R and s > 0, a probability P = Pz, on (Q, F) can be 
constructed by using properties 


1. P(X(u) =2,0<u<s)=1. 
2. P(X (t2) E B| F) = P(B,t2, X (t1), tı). 


The second property asserts that for any Borel sets A, B C R we have 


Pa(AxB) := P(X(ti) € A, X (to) € B) 
= E(P(X(t2) € B|Fa)I(X (t1) € A) 
= E(P(B,t2, X(t), ti)L((X (th) € A) 


J? (dy2,t2, Y1, ti)P, (dy1), 
aJB 
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where P} (C) = P(X (ti 


) € C). This extends to the n-dimensional cylinder 
sets {w E Q : (w(tr),...,w(t 


n)) € Jn}, where Jn C R”, by 


These probabilities give the finite dimensional distributions 
P((w(t1),.--,w(tn)) € Jn). Consistency of these probabilities is a consequence 
of the Chapman-Kolmogorov equation for the transition function. Thus by 
Kolmogorov’s extension theorem P can be extended in a unique way to F. 
This probability measure P = Py, corresponds to the Markov process started 
at x at time s, denoted earlier by X? (t). Thus any transition function defines 
a probability so that the canonical process is a Markov process. We described 
in particular a construction of the Wiener measure, or Brownian motion. 


Transition Function 


Under appropriate conditions on the coefficients u(x, t) and a(x, t), P(y,t, x, s) 
is determined from a partial differential equation (PDE), 


x —(x,s)+ Lsu(x, s) = 0, (5.49) 
Os 
called the backward PDE, involving a second order differential operator Ls, 
1 3? o 
La f(0,8) = (Laf (08) = 50%(a, 8) 4 (0, s) + ules) Eles). 630) 


It follows from the key property of the transition function, that 


FX) - Í (Lu f(X (u))du (5.51) 


is a martingale under Py, s with respect to F; for t > s, for any twice continu- 
ously differentiable function f vanishing outside a finite interval (with compact 
support), f € C}2( R). 


SDE on the Canonical Space is Satisfied 


Extra concepts (that of local martingales and their integrals) are needed to 
prove the claim rigorously. The main idea is as follows. Suppose that (5.51) 
holds for functions f(x) = x and f(x) = x?. (Although they don’t have a 
compact support, they can be approximated by C%, functions on any finite 
interval). Applying (5.51) to the linear function, we obtain that 


Y(t) = X(t) - Í p(X (u), u)du (5.52) 
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is a martingale. Applying (5.51) to the quadratic function, we obtain that 


X(t) — if (0?(X(u), u) + 2u(X(u), u)X(u)) du (5.53) 


is a martingale. By the characterization property of quadratic variation for 
continuous martingales Y?(t)—[Y, Y](¢) is a martingale, and it follows from the 
above relations that [Y, Y](t) = fi o?(X(u),u)du. One can define the Itô in- 
tegral process B(t) = Ji dY (u)/a(X (u), u). From the properties of stochastic 
integrals it follows that B(t) is a continuous local martingale with [B, B] (t) = t. 
Thus by Levy’s theorem B(t) is Brownian motion. Putting all of the above 
together and using differential notation, the required SDE is obtained. For 
details, see for example, Rogers and Williams (1990), p.160, also Stroock and 
Varadhan (1979). 


Weak Solutions and the Martingale Problem 


Taking the relation (5.51) as primary, Stroock and Varadhan defined a weak 
solution to the SDE 


dX(t) = p(X (t), t)dt + o( X(t), t)dB(t) (5.54) 
as a solution to the so-called martingale problem. 


Definition 5.12 The martingale problem for the coefficients, or the operator 
Ls, is as follows. For each x € R, and s > 0, find a probability measure Pz s 
on Q,F such that 


1. Pz,s(X (u) F x,0 sus 5) F 1, 
2. For any twice continuously differentiable function f vanishing outside 


a finite interval the following process is a martingale under Py s with 
respect to Fi 


t 

IXO) | La f)(X(w)) de (5.55) 
In the case when there is exactly one solution to the martingale problem, it is 
said that the martingale problem is well-posed. 


Example 5.17: Brownian motion B(t) is a solution to the martingale problem 


for the Laplace operator L = Lis that is, for a twice continuously differentiable 
function f vanishing outside a finite interval 


re) | 38O 
0 


is a martingale. Since Brownian motion exists and is determined by its distribution 
uniquely, the martingale problem for L is well-posed. 
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Remark 5.5: Note that if a function vanishes outside a finite interval, then 
its derivatives also vanish outside that interval. Thus for a twice continuously 
differentiable vanishing outside a finite interval function f (f € C), (Lsf) 
exists is continuous and vanishes outside that interval. This assures that the 
expectation of the process in (5.55) exists. If one demands only that f is twice 
continuously differentiable with bounded derivatives (f € C7), then (Lef) 
exists but may not be bounded, and expectation in (5.55) may not exist. If 
one takes (f € C?) then one seeks solutions to the local martingale problem, 
and any such solution makes the process in (5.55) into a local martingale. 
Local martingales are covered in Chapter 7. 


As there are two definitions of weak solutions definition 5.8 and definition 
5.12, we show that they are the same. 


Theorem 5.13 Weak solutions in the sense of Definition 5.8 and in the sense 
of Definition 5.12 are equivalent. 


PROOF: We already indicated the proof in one direction, that if the mar- 
tingale problem has a solution, then the solution satisfies the SDE. The other 
direction is obtained by using Itô’s formula. Let X(t) be a weak solution in 
the sense of Definition 5.8. Then there is a space supporting Brownian motion 
B(t) so that 


X(t) = X(s)+ f w(X(u),u)du f o(X(u),u)dB(u), and X(s) = x. (5.56) 


is satisfied for allt > s. Let f be twice continuously differentiable with compact 
support. Applying It6’s formula to f(X(t)), we have 


FX) = f(X(s)) + I (Lu f)(X(u))du + Í f'(X(u))o(X(u), u)dB(u). 
i (5.57) 
Thus 


t t 
FX) - T (Luf)(X(u))du = f(X(s)) +f f'(X(u))o(X(u), u)dB(u). 
° ; (5.58) 
Since f and its derivatives vanish outside an interval, say [—K, K], the func- 
tions f'(x)o(x,u) also vanish outside this interval, for any u. Assuming that 
a(x, u) are bounded in x on finite intervals with the same constant for all u, it 
follows that | f’(a)o(x,u)| < Ky. Thus the integral f: f'(X(u))o(X (u), u)dB(u) 
is a martingale in t, for t > s; thus the martingale problem has a solution. 
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5.8 Backward and Forward Equations 


In many applications, such as Physics, Engineering and Finance, the impor- 
tance of diffusions lies in their connection to PDEs, and often diffusions are 
specified by a PDE called the Fokker-Plank equation (introduced below (5.62), 
see for example, Soize (1994)). Although PDEs are hard to solve in closed 
form, they can be easily solved numerically. In practice it is often enough to 
check that conditions of the existence and uniqueness result are satisfied and 
then the solution can be computed by a PDE solver to the desired degree of 
accuracy. 

In this section it is outlined how to obtain the transition function that 
determines the weak solution to an SDE 


dX(t) = (X(t), tdt + o(X(t),t)dB(t), for t > 0. (5.59) 


The results below are the main results from the theory of PDEs which are 
used for construction of diffusions (see for example Friedman (1975), Stroock 
and Varadhan (1979)). 

Define the differential operator Ls, 0 < s < T by 


Ls f(x, s) = (Lsf)(x,s) = 5 (a 5) 5a (ts s) + p(x, s) (2, s). (5.60) 


The operator L, acts on twice continuously differentiable in x functions f(x, s), 
and the result of its action on f(x,s) is another function, denoted by (Lf), 
the values of which at point (x, s) are given by (5.60). 


Definition 5.14 A fundamental solution of the PDE 


aC, s) + Lsu(x,s) =0 (5.61) 


is a non-negative function p(y, t,x, 8) with following properties: 


1. it is jointly continuous in y,t, x, 8, twice continuously differentiable in x, 
and satisfies equation (5.61) with respect to s and x. 


2. for any bounded continuous function g(x) on R, and any t > 0 


u(x, s) = fe g(y) p(y, t, x, s)dy 


is bounded, satisfies equation (5.61) and lims u(x, s) = g(x), forx € R. 


Theorem 5.15 Suppose that o(a,t) and u(x,t) are bounded and continuous 
functions such that 
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(A1) o7 (at) > > 0; 


(A2) u(x,t) and o7(x,t) satisfy a Hélder condition with respect to x and t, 
that is, for all x,y E€ R and s,t >0 
luly, t) — u(x, s)| + |o*(y, t) — o° (x, s)| < K(ly— z|% + [t — 8|*). 


Then the PDE (5.61) has a fundamental solution p(y,t,x,s), which is unique, 
and is strictly positive. 

If in addition u(x,t) and o(x,t) have two partial derivatives with respect 
to x, which are bounded and satisfy a Hölder condition with respect to x, then 
p(y, t, 2,8) as a function in y and t, satisfies the PDE 


1 82 
— f sez (200p) - 5 (wu. t)p) =0. (5.62) 
Theorem 5.16 Suppose coefficients of Ls in (5.60) satisfy conditions (A1) 
and (A2) of Theorem 5.15. Then PDE r 61) has unique fundamental so- 
lution p(y,t,x,s). The function P(y,t, x, s) = fP (u,t,x,s)du uniquely de- 
fines a transition probability function. Moreover, n function has the property 
that for any bounded function f(x,t) twice continuously differentiable in x and 
once continuously differentiable in t (f € Ci R x [0,t])) 


ae fly, t)P(dy, t, x, s) (x, 5) =f fe )Fy. u)P(dy, u, x, s)du 


(5.63) 
for all0O<s<t,xeE R. 


The transition function P(y,t, xz, s) in the above theorem defines uniquely 
a Markov process X(t), that is, for all x,y and0<s<t 


P(y,t, x, s) = P(X(t) < y|X(s) = x). (5.64) 


Equation (5.61) is a PDE in the backward variables (x, s) and is therefore 
called the backward equation, also known as Kolmogorov’s backward equation. 
Equation (5.62) is a PDE in the forward variables (y,t) and is therefore called 
the forward equation, also known as Fokker-Plank equation, diffusion equation, 
or Kolmogorov’s forward equation. 

The process X(t) is called a diffusion, the differential operator Ls is called 
its generator. The property (5.63) implies that X(t) satisfies the SDE (5.59). 


Remark 5.6: A weak solution exists and is unique, possesses a Strong Markov 
property, and has density under the conditions of Theorem 5.11, much weaker 
than those of Theorem 5.15. 
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5.9 Stratanovich Stochastic Calculus 


Stochastic integrals in applications are often taken in the sense of Stratanovich 
calculus. This calculus is designed in such a way that its basic rules, such as 
the chain rule and integration by parts are the same as in the standard calculus 
(e.g. Rogers and Williams (1990) p.106). Although the rules of manipulations 
are the same, the calculi are still very different. The processes need to be 
adapted, just as in It6 calculus. Since Stratanovich stochastic integrals can be 
reduced to It6 integrals, the standard SDE theory can be used for Stratanovich 
stochastic differential equations. Note also that the Stratanovich Integral is 
more suited for generalizations of stochastic calculus on manifolds (see Rogers 
and Williams (1990)). 
A direct definition of the Stratanovich Integral, denoted RY s)ðX(s), i 

done as a limit in mean-square (L?) of Stratanovich a sums 


SE PED EYE) a) = XUD). (5.65) 
1=0 


when partitions {t?} become finer and finer. In Stratanovich approximating 


sums the average value of Y on the interval (t?,t?,,), 4 (Y(t 1+1) +Y(t?)), 


is taken, whereas in It6 integral the left most value of Y(t?) is taken. An 
alternative definition of the Stratanovich integral is given by using the It6 
integral. 


Definition 5.17 Let X ue Y be continuous adapted processes, such that 
the stochastic integral iv (s)dX(s) is defined. The Stratanovich integral is 
defined by 


fv ¥(s)OX(s ea ¥(s)dX(s E X1). (5.66) 


The Stratanovich differential is defined by 


YIX) = Y (t)dX(t) + Laly, XIA. (5.67) 


Integration by Parts: Stratanovich Product rule 


Theorem 5.18 Provided all terms below are defined, 


XOY®- o= f x X(s)0Y(s y+ fir ¥(s)aX(s), (5.68) 


O(X(t)Y(t)) = X (HOY (t) +Y HIX). (5.69) 
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ProoF: The proof is the direct application of the stochastic product rule, 
AX(t)Y(t)) = XAY (t) +Y HAX) + d[X, Y](t) 
1 
= X(t)dY(t)+ z% Y|(t) + Y(t)dX (t) + 


1 
SIX YI 


Change of Variables: Stratanovich Chain rule 


Theorem 5.19 Let X be continuous and f three times continuously differen- 
tiable (in C?), then 


AXW) - (XO) = iG f(X(s))OX(9), (5.70) 
OF(X(t)) = f(X())OX(). 
PROOF: By Itô’s formula f(X(t)) is a ala rs and by definition of 
the stochastic integral f(X(t)) — f(X =f df (X(s)). By Itô’s formula 


df (X(t) = f(X())dX E) + a X](t). 


Let Y(t) = f’(X(t)). Then according to (5.67) it is enough to show that 
dlY, X](t) = f"(X (t))d[X, X](t). But this follows by Itô’s formula as 


dY (t) = df'(X(t)) = f"(X()dX(t) + XOA, X\(t), 
and 
d[Y, X](t) = dY (t)dX (t) = F(X (AX (t)dX (t) = f"'(X(d))d[X, X] (4), 


as needed. 


Example 5.18: If B(t) is Brownian motion, then its Stratanovich stochastic differ- 
ential is 


3B? (t) = 2B(t)OB(t), 


as compared to It6 differential 


dB?(t) = 2B(t)dB(t) + dt. 
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Conversion of Stratanovich SDEs into It6 SDEs 


Theorem 5.20 Suppose that X (t) satisfies the following SDE in the Stratanovich 
SENSE 


dX(t) = w(X(t))dt + o( X(t))OB(O), (5.71) 
with o(a) twice continuously differentiable. Then X(t) satisfies the It6é SDE 
dX(t) = (u(x) 4. 50'(X(t))o(X(t)) at + o(X(t))dB(t). (5.72) 


Thus the infinitesimal drift coefficient in Itô diffusion is u(x) + $0'(x)o(2) 
and the diffusion coefficient is the same a(x). 


PROOF: By the definition of the Stratanovich integral X(t) satisfies 
1 
dX(t) = p(X (t))dt + o( X(t))dB(t) + zile (X), B] (t). (5.73) 


Since [o(X), B](t) is a finite variation process, it follows that X(t) solves a 
diffusion type SDE with the same diffusion coefficient o(X(t)). Computing 
formally the bracket, we have 


dljo(X), B|(t) = do(X (t))dB(t). 
Applying It6’s formula 
do(X(t)) = o'(X(t))dX(t) + Lo" (X()ALX, X|(t). 
It follows from (5.73) that 
d[X, B\(t) = dX (t)dB(t) = o( X(t))dt, 
therefore 
dlo(X), B\(t) = do(X (t))dB(t) = o' (X(t))dX (t)dB(t) = o'(X(t))o(X(t))dt. 
Equation (5.72) now follows from (5.73). 


Notes. Proofs and other details can be found in Dynkin (1965), Friedman 
(1975), Karatzas and Shreve (1988), Stroock and Varadhan (1979). 


5.10 Exercises 


Exercise 5.1: (Gaussian diffusions.) Show that if X(t) satisfies the SDE 
dX(t) = a(t)dt + b(t)dB(t), with deterministic bounded coefficients a(t) and 
b(t), such that hs |a(t)|dt < co, and i b?(t)dt < oo, then X(t) is a Gaussian 
process with independent Gaussian increments. 
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Exercise 5.2: Give the SDEs for X(t) = cos(B(t)) and Y (t) = sin(B(t)). 
Exercise 5.3: Solve the SDE dX(t) = B(t).X (t)dt+B(t)X (t)dB(t), X(0) =1. 


Exercise 5.4: Solve the SDE dX (t) = X(t)dt + B(t)dB(t), X(0) = 1. Com- 
ment whether it is a diffusion type SDE. 


Exercise 5.5: Find d(€(B)(t))”. 


Exercise 5.6: Let X(t) satisfy dX(t) = X?(t)dt + X(t)dB(t), X(0) = 1. 
Show that X(t) satisfies X(t) = glo ee ae 


Exercise 5.7: By definition, the stochastic logarithm satisfies £(E(X)) = X. 
Show that, provided U(t) # 0 for any t, E(£L(U)) = U. 


Exercise 5.8: Find the stochastic logarithm of B?(t) + 1. 


Exercise 5.9: Let B(t) be a d-dimensional Brownian motion, and H(t) aa 
d-dimensional regular adapted process. Show that 


E i H(s}dB(s)) (t) = exp ([ moB) = sf IHC) as) l 


Exercise 5.10: Find the transition probability function P(y, t, x, s) for Brow- 
nian motion with drift B(t) + t. 


Exercise 5.11: Show that under the assumptions of Theorem 5.15 the tran- 
sition function P(y,t,2,s) satisfies the backward equation. Give also the for- 
ward equation for P(y,t,2,s) and explain why it requires extra smoothness 
conditions on the coefficients u(x,t) and o(2,t) for it to hold. 


Exercise 5.12: Let X(t) satisfy the following stochastic differential equation 
for 0 < t < T, dX(t) = yX (t) + 1dB(t), and X(0) = 0. Assuming that Itô 
integrals are martingales, find EX (t), and E(X?(t)). Let m(u, t) = Ee”X¥ ®© be 
the moment generating function of X(t). Show that it satisfies the PDE 


ðm wom 


a 2a 2” 
Exercise 5.13: Solve the following Stratanovich stochastic differential equa- 
tion OU = UOB, U (0) = 1, where B(t) is Brownian motion. 


Chapter 6 


Diffusion Processes 


In this chapter various properties of solutions of stochastic differential equa- 
tions are studied. The approach taken here relies on martingales obtained 
by means of Itô’s formula. Relationships between stochastic differential equa- 
tions (SDEs) and partial differential equations (PDEs) are given, but no prior 
knowledge of PDEs is required. Solutions to SDEs are referred to as diffusions. 


6.1 Martingales and Dynkin’s Formula 


Itô’s formula provides a source for construction of martingales. Let X(t) solve 
the stochastic differential equation (SDE) 


dX(t) = (X(t), t)dt + o(X(t),t)dB(t), for t > 0, (6.1) 


and L; be the generator of X (t), that is, the second order differential operator 
associated with SDE (6.1), 
i 


Laflast) = (Liat) = Zoe. Eka, Haa Ea 62) 


27 


Itô’s formula (4.65) takes a compact form 


Theorem 6.1 For any twice continuously differentiable in x, and once in t 
function f(x,t) 


PE X (O, toX (t), DABO. 


(6.3) 


af(X(0),t) = (Lif(XO,0) + EA) 


Since, under appropriate conditions, the Ité integral is a martingale (see The- 
orem 4.7), by isolating the Itô integral martingales are obtained. 
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To illustrate this simple idea, let f have a bounded (by K) derivative, and 
use Itô’s formula for f(B(t)). Then 


(BO) = FO) + | Ss" Bls)as+ [FB AB). 


The It6 integral fe B(s))dB(s) is a martingale on [0, T], because condition 
(4.10) holds, fo" (f Pee < K?T < œ. Thus f(B(t)) — f df" (B(s))ds 
is a martingale. T : result is more general. 


Theorem 6.2 Let X(t) be a solution to SDE (6.1) with coefficients satisfying 
conditions Theorem 5.4, that is, u(x,t) and o(a,t) are Lipschitz in x with 
the same constant for all t, and satisfy the linear growth condition (1.30), 
ulz, t)|+lo(x,t)| < K(1+ |x|). If f(x, t) is a twice continuously differentiable 
in x and once in t function (C>!) with bounded first derivative in x, then the 
process 


mo = 1000.0 - | (x eee St) (Xu), udu (6.4) 
is a martingale. 


PROOF: By It6’s formula 


' Of 


Mit) =) 5 (X(u), wo(X(u), wd Blu). (6.5) 


0 
2 
By assumption SF (x, u) is bounded for all x and u, (L(x, u)) < Kı. There- 


fore 


s p(x (u), wolX(w).a)) du < Kı f E(P) (6.6) 


Using the linear growth condition, 


2 

i B(Sx (u), u)a(X(u)s1)) du < 2K) KT (1+B( sup x*(w))). (6.7) 
0 Ox u<T 

But B( super X 2(u)) < oo by the existence and uniqueness result, Theo- 


rem 5.4, therefore the expression in (6.7) is finite. Thus the Itô integral is a 
martingale by Theorem 4.7. 


The condition of bounded partial derivative of f can be replaced by the 
exponential growth condition (see for example, Pinsky (1995), Theorem 1.6.3). 
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Theorem 6.3 Let X(t) satisfy conditions of the previous Theorem 6.2. If 
|X (0)| possesses moment generating function Ee“|*l < oo, for all real u, 
then so does |X(t)|, Ee“/*! < 00 for allt > 0. In this case 


t 
o 
MO=IAOD- | (tut +2) xt) udu (6.8) 
0 
is a martingale for all f(x,t) € C", satisfying the following condition: for 
any t, there exist constants c+ and k such that for all x, allt > 0, and all 
O<u<t 
PENSE, EES |) RENEE NZ oe tlel, i 

max (| ðt f | Ox f | Ox? | ae (2) 
PROOF: The proof is given in Pinsky (1995) for bounded coefficients of 
the SDE, but it can be extended for this case. We give the proof when the 
diffusion X(t) = B(t) is Brownian motion. Let X(t) = B(t), then by Itô’s 
formula M(t) is given by 


M;(t) = f PNAC) 9) p(s) (6.10) 


By the bound on | 229) |, fors<t 


z (eoa < Gp (HBD), 


Writing the last expectation as an integral with respect to the density of the 
N (0, t) distribution, it is evident that it is finite, and its integral over [0, t] is 


finite, 
[2 (4803) a <œ. (6.11) 


By the martingale property of Itô integrals, (6.11) implies that the Itô integral 
(6.10) is a martingale. One can prove the result without use of Itô’s formula, 
by doing calculations of integrals with respect to Normal densities (see for 
example, Rogers and Williams (1987) p.36). 


Corollary 6.4 Let f(x,t) solve the backward equation 


0 
Lyf (x,t) + stn, t) =0, (6.12) 
and conditions of either of the two theorems above hold. Then f(X(t),t) is a 


martingale. 
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Example 6.1: Let X(t) = B(t), then (Lf)(x) = 4f” (zx). Solutions to Lf = 0 are 
linear functions f(z) = ax +b. Hence f(B(t)) = aB(t) +b is a martingale, which is 
also obvious from the fact that B(t) is a martingale. 


Example 6.2: Let X(t) = B(t). The function f(«,t) = e*~‘/? solves the backward 
equation 

10°f of 

3 Oa Arg (x,t) + Fe E t)=0. (6.13) 
Therefore, by the above Corollary 6.4 we recover the exponential martingale of Brow- 
nian motion e?()-*/?, 


Corollary 6.5 (Dynkin’s Formula) Let X(t) satisfy (6.1). If the condi- 
tions of either of the above theorems hold, then for any t, O0 <t<T, 
t of 
EAX, t) = f(X(0),0) +B | (Laf + Z) Xu) udu. (614) 
0 


The result is also true if t is replaced by a bounded stopping timeT, O<7<T. 


Proor: The bounds on the growth of the function and its partial derivatives 
are used to establish integrability of f(X (t), t) and other terms in (6.14). Since 
Mp(t) is a martingale, the result follows by taking expectations. For bounded 
stopping times the result follows by the Optional Stopping Theorem, given in 
Chapter 7. 


Example 6.3: We show that J = S sdB(s) has a Normal N (0, 1/3) distribution, 
by finding its moment generating function m(u) = E(e”7). Consider the Itô integral 
X(t) = jo sdB(s), t < 1, and notice that J = Xı. As dX(t) = tdB(t), X(t) is an Itô 
process with u(x,t) = 0 and o(x,t) = t. Take f(x,t) = f(x) = e””. This function 
satisfies conditions of Theorem 6.3. It is easy to see that L: f(x, t) = stu 2e"” note 
that a = 0. Therefore by Dynkin’s formula 


t 
B(e**®) =1+ se f E(x) ds. 
0 


Denote h(t) = E(e"*), then differentiation with respect to t leads to a simple 
equation 


hi(t) = Zathe t), with h(0) = 


By separating variables, log h(t) = šu? ” s?ds = Lyre Thus h(t) = ex 


which corresponds to the N(0, #) distribution. Thus X(t) = f sdB(s) has N (0, a 
distribution, and the result follows. 

Example 6.4: We prove that i B(t)dt has N(0, ae distribution, see also Ex- 
ample 3.6. Using integration by parts S B(t)dt = -f tdB(t sf dB(t 

S tdB(t =i (1 — t)dB(t), and the result follows. 
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6.2 Calculation of Expectations and PDEs 


Results in this section provide a method for calculation of expectations of 
a function or a functional of a diffusion process on the boundary. This ex- 
pectation can be computed by using a solution to the corresponding partial 
differential equation with a given boundary condition. This connection shows 
that solutions to PDEs can be represented as functions (functionals) of the 
corresponding diffusion. 

Let X(t) be a diffusion satisfying the SDE for t > s > 0, 


dX(t) = (X(t), t)dt + o(X(t),t)dB(t), and X(s) = x. (6.15) 


Backward PDE and E(g(X(T))|X(t) = 2) 


We give results on E(g(X(T))|X(t) = x). Observe first that g(X(T)) must be 
integrable (E|g(X(T))| < co) for this to make sense. Of course, if g is bounded, 
then this is true. Observe next that by the Markov property of X(t), 


E(g(X(T))|X(¢)) = E(g(X(T))|F1)- 


The latter is a martingale, by Theorem 2.31. The last ingredient is It6’s 
formula, which connects this to the PDE. Again, care is taken for Itô’s formula 
to produce a martingale term, for which assumptions on the function and its 
derivatives are needed, see Theorem 6.3. Apart from these requirements, the 
results are elegant and easy to derive. 


Theorem 6.6 Let f(x,t) solve the backward equation, with Lı given by (6.2), 


Li f(x,t)+ ae) =0, with f(x,T) = g(x). (6.16) 


If f(x,t) satisfies the conditions of Corollary 6.4, then 
f(x, t) = E(g(X(T))|X@) 


Proor: By Corollary 6.4 f(X(t),t), s < t < T, is a martingale. The 
martingale property gives 


gya (6.17) 


E (F(X (T), T)| F) = F(X (t), t). 
On the boundary f(x, T) = g(x), so that f(X(T),T) = g(X(T)), and 
FX), t) = E (gX (T)IF:) . 


By the Markov property of X(t), f(X(#),t) = E(g(X(T))|X(t)), and the 
result follows. 
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It is tempting to show that the expectation E (g(X(T))|X(t) = x) = f(a, t) 
satisfies the backward PDE, thus establishing existence of its solutions. As- 
sume for a moment that we can use Itô’s formula with f(X(t),t), then 


sxe) = s060),0)+ | (rt É) Eds + M10, 


where M(t) is a martingale. As noticed earlier, f(X (t), t) = E(g(X (T) |F) 
is a martingale. It follows that SLf + 8f)(X(s), s)ds is a martingale as a 
difference of two martingales. Since the integral with respect to ds is a function 
of finite variation, and a martingale is not, the latter can only be true if the 
integral is zero, implying the backward equation (6.16). 

To make this argument precise one needs to establish the validity of Itô’s 
formula, i.e. smoothness of the conditional expectation. This can be seen by 
writing the expectation as an integral with respect to the density function, 


f (x,t) = E(g(X(P))|X (t) = 2) = J swlu. 2.2, Ody, (6.18) 


where p(y,T,2,t) is the transition probability density. So x is now in the 
transition function and the smoothness in x follows. For Brownian motion 
(v-r) 


ply, T, x,t) = ope V is differentiable in x (infinitely many times) and 
the result follows by differentiating under the integral. For other Gaussian 


diffusions the argument is similar. It is harder to show this in the general case 
(see Theorem 5.16). 


Remark 6.1: Theorem 6.6 shows that any solution of the backward equation 
with the boundary condition given by g(x) is given by the integral of g with 
respect to the transition probability density, equation (6.18), affirming that the 
transition probability density is a fundamental solution (see Definition 5.14). 


A result similar to Theorem 6.6 is obtained when zero in the rhs of the back- 
ward equation is replaced by a known function —@. 


Theorem 6.7 Let f(x,t) solve 


Li f(x,t) + PF (0,2) = —¢(x), with f(z,T) = g(x). (6.19) 
Then 
T 
f(z,t) =E (faxo +f soas) X(t = a) f (6.20) 


The proof is similar to the above Theorem 6.6 and is left as an exercise. 
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Feynman-Kac formula 
A result more general than Theorem 6.6 is given by the Feynman-Kac formula. 


Theorem 6.8 (Feynman-Kac Formula) For given bounded functions r(x, t) 
and g(x) let 


C(e, t) = E(e7 Jy roe ge x (P))|X (2) =). (6.21) 
Assume that there is a solution to 
0 
OF (wt) + Lef at) = rætt), with f(e,T)= 92), (622) 


Then the solution is unique and C(a,t) is that solution. 


PROOF: We give a sketch of the proof by using It6’s formula coupled with 
solutions of a linear SDE. Take a solution to (6.22) and apply Itô’s formula 
of of 
PXD = (FXO, + Lif(X(,1)) dt + (XH), DXW, BG). 
The last term is a martingale term, so write it as dM(t). Now use (6.22) to 
obtain 
df(X(t),t) = r(X(t), F(X), dt + dM (t). 


This is a linear SDE of Langevin type for f(X(t),t) where B(t) is replaced 
by M(t). Integrating this SDE between t and T, and using T > t as a time 
variable and t as the origin (see (5.32) in Section 5.22) we obtain 


T T T ps 
ARDAO, pel: Ade, Ee] ah X du Ma), 
$ 
But f(X(T),T) = g(X(T)), and rearranging, we obtain 
Trx d E d 
ATEA EOD — fa f eh Odama, 
t 


As the last term is an integral of a bounded function with respect to martingale, 
it is itself a martingale with zero mean. Taking expectation given X(t) = 2, 
we obtain that C(x,t) = f(a,t). For other proofs see for example, Friedman 
(1975) and Pinsky (1995). 


Remark 6.2: The expression e~"?—E (g(X(T))|X(t) = x) occurs in Fi- 
nance as a discounted expected payoff, where r is a constant. The discounting 
results in the term rf in the rhs of the backward PDE. 
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Example 6.5: Give a probabilistic representation of the solution f(x, t) of the PDE 


1 2 oP f of | Of 2 

where o, u and r are positive constants. Solve this PDE using the solution of the 
corresponding stochastic differential equation. 

The SDE corresponding to L is dX(t) = uX(t)dt + oX(t)dB(t). Its solution is 


X(t) = X (0je H77? /Dt+ B0), By the Feynman-Kac formula 


fæ, t) =E (Tnx = z) = e™™T-9 F(X? (T)|X(t) = 2). 


Using X(T) = X(t)e o?/D (T-t) +0 (B(T) BŒ) we obtain 
E(X?(T)|X(t) = 2) = are Ht? T=) giving f(a,t) = gelta? =r) (T-t), 


The following result shows that f(x, t) = E(g(X(T))|X(t) = x) satisfies the 
backward PDE, and can be found in Gihman and Skorohod (1972), Friedman 
(1975). (See also Theorem 6.5.3 Friedman (1975) for C(x, t)) 


Theorem 6.9 (Kolmogorov’s Equation) Let X(t) be a diffusion with gen- 
erator L,. Assume that the coefficients u(x,t) and o(x,t) of L, are locally 
Lipschitz and satisfy the linear growth condition (see (1.30)). Assume in ad- 
dition that they possess continuous partial derivatives with respect to x up to 
order two, and that they have at most polynomial growth (see (1.31)). If g(x) 
is twice continuously differentiable and satisfies together with its derivatives a 
polynomial growth condition, then the function f (x,t) = E(g(X(T))|X (t) = x) 
satisfies 
of 


Bp eb) + Lif (x,t) = 0,in the region 0 <t<T,x € R, (6.24) 


with boundary condition f(x, T) = limar f(x, t) = g(x). 


6.3 Time Homogeneous Diffusions 


The case of time-independent coefficients in SDEs corresponds to the so-called 
time-homogeneous diffusions, 


dX (t) = u(X(t))dt + o(X(t))dB(t). (6.25) 


Theorem 6.10 Assume that there is a unique weak solution to (6.25). Then 
the transition probability function of the solution P(y,t,x,s) = P(y,t—s,2x,0) 
depends only ont — s. 
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PROOF: Denote by (X, B) the weak solution to 6.25. By the definition of 


the transition function P(y,t, x, s) = P(X(t) < y|X(s) = x) = P(X?(t) < y), 
where the process X*(t) satisfies X?(s) = x and for t > 0 


s+t s+t 
Xi(stt)= 2+ f u(X? (u))du +f o(X?(u))dB(u). (6.26) 


Let Y(t) = X?(s + t), and By(t) = B(s + t) — B(s), t > 0. Then Bı(t) is a 
Brownian motion and from the above equation, Y (t) satisfies for t > 0 


Y(t)=x +f u(Y (v))dv +f o(Y(v))dBi(v), and Y (0) = x. (6.27) 
Put s = 0 in (6.26) to obtain 
Xo (t) = e+ f u( XG (v))dv +f o(X6(v))dB(v), and X§(0) =a. (6.28) 


Thus Y(t) and X((t) satisfy the same SDE. Hence Y(t) and X(t) have the 
same distribution. Therefore for t > s 


Ply,t,¢,8) = P(XZ(t) Sy) =PY(t- s5) <y) 
= 2 ) 


Since the transition function of a homogeneous diffusion depends on ¢ and s 
only through t — s, it is denoted as 


P(t,xz,y) = P(y,t + s,x,s) = Ply,t,x,0) = P(X(t) < y|X (0) = x), (6.29) 


and it gives the probability for the process to go from x to (—oo,y] during 
time t. Its density p(t,z,y), when it exists, is the density of the conditional 
distribution of X(t) given X(0) = x. 

The generator L of a time-homogeneous diffusion is given by 


Lja) = 07a) F" (a) + Hla) f'(@)- (6.30) 


Under appropriate conditions (conditions (A1) and (A2) of Theorem 5.15) 
p(t, x,y) is the fundamental solution of the backward equation (5.61), which 
becomes in this case 

Op 1 


z = P= aT (a t ule) 5-- (6.31) 
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If moreover, o(x) and u(x) have derivatives, o’(x), p(x), and o” (a), which 
are bounded and satisfy a Holder condition, then p(t, x, y) satisfies the forward 
equation in t and y for any fixed x, which becomes 


Op 18/3 o 
oP (t,x, = 555 ( tT; )-=( tT, ). 6.32 
Rp bY) = 5 PA (y)p(t, x,y) Jy ply)p(t, z, y) (6.32) 
In terms of the generator, the backward and the forward equations are written 
as 
Op Op 
AS — = [* 6.33 
a T P Oe p, (6.33) 
where 


Epo) = 4 (eor) - 2 to) 
2 Oy? Oy 
denotes the operator appearing in equation (6.32), and is known as the ad- 
joint operator to L. (The adjoint operator is E by the eee that 
whenever the following integrals exist, f 9(x)Lf(x)dx = f f(x (x)dz.) 


Example 6.6: The generator of Brownian motion L = 5 +> ra is called the Laplacian. 
The backward equation for the transition probability density is 

Op 1p 

Æ = Lp = —-—. .34 

Ot P= 3 pr? Ge) 
Since the distribution of B(t) when B(0) = x is N(a,t), the transition probability 
density is given by the density of N(x,t), and is the fundamental solution of PDE 
(6.34) 


p(t, x,y) = om 
The adjoint operator L* is the same as L, so that L is self-adjoint. A stronger result 
than Theorem 6.9 holds. It is possible to show, see for example, Karatzas and Shreve 
(1988) p.255, that if fem |g(y)|dy < oo for some a > 0, then f(z,t) = Ezg(B(t)) 
for t < 1/2a satisfies the heat equation with initial condition f(0,x) = g(x), x€ R. 
A result of Widder 1944, states that any non-negative solution to the heat equation 
can be represented as f p(t, x, y)dF (y) for some non-decreasing function F. 


Example 6.7: The Black-Scholes SDE 
dX(t) = uX (t)dt + o X (t)dB(t) 


for constants u and o. The generator of this diffusion is 


1 
Lf (a) = 5072? f" (2) + naf (2). (6.35) 
Its density is the fundamental R of the PDE 
Op _1 ə 208p Op 


at 27 F Ox? Pe Ox” 
The transition probability function of X(t) was found in Example 5.14. Its density 
is p(t,z,y) = 2 gouae ce ens) ) 


aovV/t—s 
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It6’s Formula and Martingales 


If X(t) is a solution of (6.25) then It6’s formula takes the form: for any twice 
continuously differentiable f(z) 


df (X(t)) = Lf(X(t))dt + F'(X (t))o(X()))dB(t). (6.36) 
Theorem 6.2 and Theorem 6.3 for time homogeneous diffusions becomes 


Theorem 6.11 Let X be a solution to SDE (6.25) with coefficients satisfying 
conditions of Theorem 5.4, that is, u(x) and o(a) are Lipschitz and satisfy the 
linear growth condition |u(x)| + jo(a)| < K(1 + |z|). If f(x) is twice contin- 
uously differentiable in x with derivatives growing not faster than exponential 
satisfying condition (6.9), then the following process is a martingale. 


MO = FXO) - f LAK) (6.37) 


Weak solutions to (6.25) are defined as solution to the martingale problem, 
by requiring existence of a filtered probability space, with an adapted process 
X(t), so that 


f(X() - f Lf(X(u))du (6.38) 


is a martingale for any twice continuously differentiable f vanishing outside 
a finite interval, see Section 5.8). Equation (6.38) also allows us to identify 
generators. 


Remark 6.3: The concept of generator is a central concept in studies of 
Markov processes. The generator of a time-homogeneous Markov process (not 
necessarily a diffusion process) is a linear operator defined by: 


_ E(f(XO)Xo = 2) - f(@) 
Lf(«) = im >. 


lim ; (6.39) 


If the above limit exists we say that f is in the domain of the generator. 
If X(t) solves (6.25) and f is bounded and twice continuously differentiable, 
then from (6.39) the generator for a diffusion is obtained. This can be seen by 
interchanging the limit and the expectation (dominated convergence), using 
Taylor’s formula. Generators of pure jump processes, such as birth-death 
processes, are given later. For the theory of construction of Markov processes 
from their generators and their studies see for example, Dynkin (1965), Ethier 
and Kurtz (1986), Stroock and Varadhan (1979), Rogers and Williams (1990). 


The result on existence and uniqueness of weak solutions (Theorem 5.11) be- 
comes 
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Theorem 6.12 If a(x) is positive and continuous and for any T > 0 there is 
Kr such that for allae R 


|u(x)| + lo(a)| < Kr(1 + |2\) (6.40) 


then there exists a unique weak solution to SDE (6.25) starting at any point 
x € R, moreover the solution has the strong Markov property. 


The following result is specific for one-dimensional homogeneous diffusions and 
does not carry over to higher dimensions. 


Theorem 6.13 (Engelbert-Schmidt) The SDE 
dX(t) = o( X(t))dB(t) 


has a weak solution for every initial value X(0) if and only if for alla € R 


the condition 
a dy 
== © forala>0 
-a ° (@ + y) 
implies o(x) = 0. The weak solution is unique if the above condition is equiv- 
alent to o(x) = 0. 


Corollary 6.14 If a(x) is continuous (on R) or bounded away from zero, 
then the above SDE has a unique weak solution. 


Example 6.8: By the above corollary Tanaka’s SDE, Example 5.15, 
dX(t) = sign(X(t))dB(t), X(0) = 0, 


has a unique weak solution. 


6.4 Exit Times from an Interval 


The main tool for studying various properties of diffusions is the result on 
exit times from an interval. Define T(a p) to be the first time the diffusion 
exits (a,b), Tap) = inf{t > 0: X(t) ¢ (a,b)}. Since X(t) is continuous, 
X(T(a,b)) = a or b. It was shown in Theorem 2.35 that 7 is a stopping time, 
moreover, since the filtration is right-continuous, {r < t} and {r > t} are in 
F, for all t. In this section results on 7/q,) are given. As the interval (a,b) is 
fixed, denote in this section T(q,») = T- 

The fact that the process started in (a,b) remains in (a,b) for all t < T 
allows to construct martingales, without additional assumptions on functions 
and coefficients. The following important result for analyzing diffusions, which 
is also known as Dynkin’s formula, is established first. Introduce T, and Tp as 
the hitting times of a and b, Ta = inf{t > 0: X(t) = a}, with the convention 
that the infimum of an empty set is infinity. Clearly, 7 = min(Ta, To) = Ta AT». 
The next result is instrumental for obtaining properties of 7T. 
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Theorem 6.15 Let X(t) be a diffusion with a continuous a(x) > 0 on [a,b] 
and X(0) = 2, a <a <b. Then for any twice continuously differentiable 
function f(x) on R the following process is a martingale 


sx(enry - | " L(X(s))ds. (6.41) 


Consequently, 


Bx (f(X(tA7)) - f g Lf(X(s))ds) = f(a). (6.42) 


PRooF: Using Itô’s formula and replacing t by t A 7 
tat 
f(X(tAr))- J Lf(X(s)) f f'(X(s))o(X(s))dB(s). (6.43) 
0 


Write the Itô integral as M I(s < T)F'(X(s)jao(X(s))dB(s). By Theorem 2.35 
{r > s} are in F, for all s. Thus I(s < 7T) is adapted. Now, for any s < 7, 
X(s) € [a,b]. Since f’(x)o(x) is continuous on fa, b], it is bounded on fa, b], say 
by K. Thus for any s < t, |I(s < rT) f’(X(s))o(X(s))| < K, and expectation 
TA s <7)(f’(X(s))o(X(s)))?ds < K?t is finite. Therefore the Itô integral 
PEN 8<T) Ee ee is a martingale for t < T, and any T. Since 
it is a martingale it has a constant mean, and taking expectations in (6.43) 
formula (6.42) is obtained. 


The next result establishes in particular that 7 has a finite expectation, con- 
sequently it is finite with probability one. 


Theorem 6.16 Let X(t) be a diffusion with generator L with continuous 
a(x) > 0 on [a,b] and X(0) = z, a < a < b. Then E,(r) = v(x) satisfies 
the following differential equation 


Lv=-]1, (6.44) 
with v(a) = v(b) = 0. 
Proor: Take v(x) satisfying (6.44). By the previous Theorem 6.15 
tAT 
E, (v(X(tA7)) -f Lv(X(s))ds) = v(x). (6.45) 
0 


But Lv = —1, therefore 


Ex (oxen T) + E(t A7) =v(2), (6.46) 


162 CHAPTER 6. DIFFUSION PROCESSES 


and 
E,(t\ 7) = v(x) — Ex (uxt A 7))) (6.47) 


(tA T) increases to T as t — oo. Since the functions v and X are continuous, 
u(X(t A T)) > v(X(r)) as t => œ. X(t AT) € (a,b) for any t and v(x) is 
bounded on [a,b], say by K, therefore E,(u(X(t A 7))) < K. It follows from 
the above equation (6.47) that E,(T) < oo, hence 7 is almost surely finite, 
moreover by dominated convergence Ez(u(X(t A 7))) > Es(u(X(7))). But 
X(T) =a or b, so that v(X(r)) = 0. Thus from (6.47) E,(r) = v(x). 


The probability that the process reaches b before it reaches a, that is, 
P.(T, < Ta) is used to obtain further properties. This probability is calculated 
with the help of the function S(x), which is a solution to the equation 


507 (2)"(2) + u(x)S' (x) = 0, or LS =0. (6.48) 


Any solution to (6.48) is called a harmonic function for L. Only positive 
harmonic functions are of interest, and ruling out constant solutions, it is easy 
to see (see Example 6.9 below) that for any such function 


S' (£) = Cexp ( z | : = as). (6.49) 


0 


S'(x) is either positive for all x, if C > 0, or negative for all x if C < 0. 
Consequently, if S is not identically constant, S(x) is monotone. Assume that 
a(x) is continuous and positive, and u(x) is continuous. Then any L-harmonic 
function comes from the general solution to (6.48), which is given by 


S(z) = f exp (- T - du) du, (6.50) 


and involves two undetermined constants. 


Example 6.9: We show that harmonic functions for L are given by (6.50). S must 
solve 


52°(2)8" (a) Dyes Gree (6.51) 


This equation leads to (with h = S”) 
h! fh = ~2(a)/0 (2), 
and provided p(x)/o7(x) is integrable, 
© 2u(y) 


S' (x)= Pa a? (y) i. 


Integrating again we find S(x). 
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Theorem 6.17 Let X(t) be a diffusion with generator L with continuous 
a(x) > 0 on [a,b]. Let X(0) =2,a<a<b. Then 


P,,(Th < Ta) = a Fay’ (6.52) 
where S(x) is given by (6.50). 
PRoor: By Theorem 6.15 
Ez (S(x(tA 7) 2 f LS(X(s))ds) = S(x). (6.53) 
But LS = 0, therefore 
E, (sixa A 7))) = S(x). (6.54) 


Since 7 is finite it takes values Tẹ with probability P,.(T, < Ta) and Ta with 
the complimentary probability. It is not a bounded stopping time, but by 
taking limit as t — oo, we can assert by dominated convergence, that 


ES(X(r)) = ES(X(0)) = S(2). 


Expanding the expectation on the left and rearranging gives the result. 


Remark 6.4: Note that the ratio in (6.52) remains the same no matter what 
non-constant solution S(x) to the equation (6.50) is used. 

Note that although the proof is given under the assumption of continuous 
drift u(x), the result holds true for u(x) bounded on finite intervals (see for 
example, Pinsky (1995)). 


The above theorem has a number of far reaching corollaries. 


Corollary 6.18 Let X(t) be a diffusion with zero drift on (a,b), X(0) = a, 


a<a<b. Then 
r—a 


b-a 


P(T < Ta) = (6.55) 


PROOF: If u(x) =0 on (a,b), then S(x) is a linear function on (a,b) and the 
result follows from (6.52). 


Diffusion S(X(t)) has zero drift by Itô’s formula and (6.48). The exit 
probabilities from an interval are proportional to the distances from the end 
points. This explains why the function S(x) is called the scale function. The 
diffusion S(X(t)) is said to be on the natural scale. 
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Example 6.10: We specify Pa(Teè < Ta) for Brownian motion and Ornstein- 
Uhlenbeck processes. Brownian motion is in natural scale, since u(x) = 0. Thus 
S(x) =a, and P2(T, < Ta) = (x — a)/(b— a). 

For Ornstein-Uhlenbeck process with parameters u(x) = —ax, o° (x) = o°, 


S(x) = i: exp (Sv) 


J exp (2) dy 
Siep (Sy?) ay 


Under standard assumptions, there is a positive probability for the diffusion 
process to reach any point from any starting point. 


Consequently 


Pz (Ty < Ta) a 


Corollary 6.19 Let X(t) be a diffusion satisfying the assumptions of Theorem 
6.17. Then for any x,y € (a,b) 


P,(Ty < œ) > 0. (6.56) 


Indeed, for £ < y, Ty < Ty, and Pz (Ty < co) > Pz(Te < co) > Pz (Th < Ta) > 
0, and similarly for y < zx. 

As an application of the properties of the scale function, a better result 
for the existence and uniqueness of strong solutions is obtained. The transfor- 
mation Y(t) = S(X(t)) results in a diffusion with no drift which allows us to 
waive assumptions on the drift. 


Theorem 6.20 (Zvonkin) Suppose that u(x) is bounded and o(x) is Lips- 
chitz and is bounded away from zero. Then the strong solution exists and is 
unique. In particular any SDE of the form 


dX(t) = w(X(t))dt + odB(t), and X(0) = zo, (6.57) 
with any bounded u(x) and constant o has a unique strong solution. 


Proor: If Y(t) = S(X(#)), then dY(t) = o( X(t))S’(X(t))dB(t). Notice 
that S(x) is strictly increasing, therefore X(t) = h(Y(t)), where h is the in- 
verse to S. Thus SDE for Y(t) is dY(t) = o(h(Y (#)))S’(A(Y (t)))dB(t) = 
oy (Y(t))dB(t). The rest of the proof consists in verification that oy(x) = 
o(h(x))S"(h(a)) is locally Lipschitz and, under the stated assumptions, satis- 
fies conditions of the existence and uniqueness result. 
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6.5 Representation of Solutions of ODEs 


Solutions to some PDEs have stochastic representations. Such representations 
are given by Theorems 6.6 and 6.8. Here we show that if a solution to an ODE 
satisfying a given boundary conditions exists, then it has a representation as 
the expectation of a diffusion process stopped at the boundary. 

Theorem 6.21 Let X(t) be a diffusion with generator L with time-independent 
coefficients, L = 307 (x x), + p(x), continuous o(x) > 0 on [a,b], and 
X(0)=a2,a<a <b. If f is twice continuously differentiable in (a,b) and 
continuous on [a,b] and solves 


Lf =—¢@ in (a,b), f(a) = g(a), f(b) = g(b) (6.58) 
for some bounded functions g and ¢, then f has the representation 
f(a) = Ex(a(X(7))) +Ex( fo X(s))ds), (6.59) 


where T is the exit time from (a,b). In particular if é = 0, the solution of 
(6.58) is given by 
f(x) = Ez (g(X (7))). (6.60) 


PROOF: The proof is immediate from Theorem 6.15. Indeed, by (6.42) 


Bs (f(X(tA7)) - f D Lf(X(u))dw) = f(x). 


Since 7 is finite, by taking limits as t — oo by dominated convergence 


Be(f(X(r))) = Fe) + Bo( f bf(X(w)du), 


(X (u)) for any u < T, and X(r) is in the boundary {a,b}, 


But Lf(X(u)) = 
= g(x ), and the result follows. 


where f(x) 


Example 6.11: Let X = B be Brownian motion. Consider the solution of the 
problem 

1 : 
z (2) =0in (a,b), f(a) = 0, f(b) = 
Here we solve this problem directly and verify the result of Theorem 6.21. Clearly, 
the solution is a linear function, and from boundary conditions it follows that it must 
be f(a) = (x — a)/(b — a). By (6.60) this solution has the representation 


f(z) = E.(9(B(r))) 
= g(a)P2(To < Ts) + 9(b)(1 — Po(Ta < Ts)) = Pe (Tp < Ta). 


As we know, for Brownian motion, Pz (Ty < Ta) = (x — a)/(b — a), and the result is 
verified. 
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6.6 Explosion 


Explosion refers to the situation when the process reaches infinite values in 
finite time. For example, the function 1/(1 — t), t < 1, explodes at ¢ = 1. 
Similarly, the solution x(t) to the ordinary differential equation 


dx(t) = (1+2°(t))dt, x(0) =0 


explodes. Indeed, consider x(t) = tan(t), which approaches infinity as t ap- 
proaches 7/2. The time of explosion is 7/2. Similar situation can occur with 
solutions to SDEs, except the time of explosion will be random. Solutions can 
be considered until the time of explosion. 

Let diffusion X(t) satisfy SDE on R 


dX(t) = p(X(t))dt + o(X(t))dB(t), and X(0) = z. (6.61) 


Let Dn = (—n,n) for n = 1,2,.... Tn = Tp, is the first time the process has 
absolute value n. Since a diffusion process is continuous, it must reach level 
n before it reaches level n +1. Therefore 7, are non-decreasing, hence they 
converge to a limit Tæ = limp—oo Tn. Explosion occurs on the set {Ts < co}, 
because on this set, by continuity of X(t), X (Tæ) = limps. X (Tn). Thus 
|X (Too) | = limn—oo |X(™m)| = limn—oo n = œ, and infinity is reached in finite 
time on this set. 


Definition 6.22 Diffusion started from x explodes if Pa(Txo < œ) > 0. 


Note that under appropriate conditions on the coefficients, if diffusion 
explodes when started at some zo, then it explodes when started at any 
x € R. Indeed, if for any x,y, Pa(Ty < œ) > 0 (see Corollary 6.19), then 
Py(Too < 00) > Py(Ty < 0)P2(Too < œ) > 0. 

The result below gives necessary and sufficient conditions for explosions. 
It is known as Feller’s test for explosions. 


Theorem 6.23 Suppose u(x), a(x) are bounded on finite intervals, and a(x) > 
0 and is continuous. Then the diffusion process explodes if and only if one of 
the two following conditions holds. There exists xo such that 


Y 2u(s) 
o exp (i IEY yS 


1. f° exp ( — Sa zelas) (i mhna) a) dz < co. 


Y 2u(s) ds 


2. Je exp (- fe 2112} qs) GE vo (I, He) bay) a < œ. 


xo o? (s) zo o? (y 
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The proof relies on the analysis of exit times 7, with the aid of Theorem 6.16 
and the Feynman-Kac formula (see for example, Gihman and Skorohod (1972), 
p.163, Pinsky (1995) p.213-214). 

If the drift coefficient u(x) = 0, then both conditions in the above theorem 
fail, since [°° o~?(y)dy # 0 as x — —on, hence the following result. 


Corollary 6.24 SDEs of the form dX(t) = o(X(t))dB(t) do not explode. 


Example 6.12: Consider the SDE dX(t) = cX” (t)dt + dB(t), c > 0. Solutions of 
dx(t) = cx” (t)dt explode if and only if r > 1, see Exercise 6.11. Here o(x) = 1 and 
p(x) = cx”, c > 0, and D = (a, 8) = (0,00). It is clear that this diffusion drifts to 
+oo due to the positive drift for any r > 0. However, explosion occurs only in the 
case of r > 1, that is, P2(tp < co) > 0 if r > 1, and Pz(tp < co) = 0 if r < 1. The 
integral in part 2 of the above theorem is 


oo ie exp (25 )ay 
J ue inte RDN BE 
zo exp (žert) 


Using l’Hopital rule, it can be seen that the function under the integral is of order 


x" as x — oo. Since p a~"dx < oo if and only if r > 1, the result is established 


(see Pinsky (1995)). 
Example 6.13: Consider the SDE dX(t) = X°(t)dt + X"(t)dB(t). Using the 


integral test, it can be seen that if r < 3/2, there is no explosion, and if r > 3/2 
there is an explosion. 


6.7 Recurrence and Transience 


Let X(t) be a diffusion on R. There are various definitions of recurrence and 
transience in the literature, however, under the imposed assumptions on the 
coefficients, they are all equivalent. 


Definition 6.25 A point x is called recurrent for diffusion X (t) if the proba- 
bility of the process coming back to x infinitely often is one, that is, 


P,(X(t) = x for a sequence of t’s increasing to infinity ) = 1. 
Definition 6.26 A point x is called transient for diffusion X(t) if 
P,( Jim |X (t)| = oo) =1. 


If all points of a diffusion are recurrent, the diffusion itself is called recurrent. 
If all points of a diffusion are transient, the diffusion itself is called transient. 
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Theorem 6.27 Let X(t) be a diffusion on R satisfying assumptions of the ex- 
istence and uniqueness result Theorem 5.11, that is, u(x) and o(a) are bounded 
on finite intervals, a(x) is continuous and positive and u(x), o(a) satisfy the 
linear growth condition. Then 


1. If there is one recurrent point then all points are recurrent. 
2. If there are no recurrent points, then the diffusion is transient. 


To prove this result two fundamental properties of diffusions are used. The first 
is the strong Markov property, and the second is the strong Feller property, 
which states that for any bounded function f(x), E,f(X(t)) is a continuous 
function in x for any t > 0. Both these properties hold under the stated 
conditions. It also can be seen that the recurrence is equivalent to the property: 
for any x,y P;(Ty < co) = 1, where T} is the hitting time of y. By the above 
Theorem, transience is equivalent to the property: for any x,y Pz(Ty < œ) < 
1, see for example, Pinsky (1995). To decide whether P,(Ty < co) < 1, the 
formula (6.52) for the probability of exit from one end of an interval in terms of 
the scale function is used. If a diffusion does not explode, then the hitting time 
of infinity is defined as Ta = limp Tp = œ and Tæ = limg+-w Ta = œ. 
Recall that S(a) = f° exp (— fi 24l) ds)du, and that by (6.52) 

Pa(To < Ta) = (S(x) — S(a))/(S(b) — S(a)). Take any y > a, then 


in 5802) = So 
the By) — 5a) 


Thus if $(—oo) = limg_.-.. S(a) = œ, then P;(Ty < œ) =1. 
Similarly, for y < a, 


P,(Ty <0o) = lim P,(Ty <Ta) = (6.62) 


x) — S(b 
P,(Ty < œ) = jim Ps (Ty <T)= jim H 
Thus if S(co) = limp... S(b) = œ, then P;(Ty < œ) = 1. Thus for any y, 
P¿(Ty < œ) = 1, which is the recurrence property. 
If one of the values S(—oo) or S(o0) is finite, then for some y, 
P,(Ty < oo) < 1, which is the transience property. Thus the necessary and 
sufficient conditions for recurrence and transience are given by 


Theorem 6.28 Let operator L = 142(2)4 + p(x) have coefficients that 
satisfy assumptions of the above Theorem 6.27. Denote for a fixed xo, 


n= f æf- a as) du md a= [ow (- f ops) du. 


The diffusion corresponding to L is recurrent if and only if both I, and I> are 
infinite, and transient otherwise, that is, when one of I, or Iz is finite. 
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6.8 Diffusion on an Interval 


Consider diffusion on an interval (a, 3), where one of the ends is finite. The 
main difference between this case and diffusion on the whole line, is that the 
finite end of the interval may be attained in finite time, the situation analogous 
to explosion in the former case. Take a finite, that is -coo <a < 8 < co. The 
case of finite @ is similar. Write the scale function in the form 


= eo = “ 2u(s) S)adu 
S(x) = J. p( f a 18) (6.63) 


1 0 


where xo, xı € (a, 8). By using stopping of S(X(t)), a martingale is obtained, 
and by the same argument as before the probabilities of exit from an interval 
(a,b) C (a, 8) are given by (6.52) 


Po(Za < Th) = (S(b) — S(a))/(S(0) — S(a)). 


If S(a) = —oo, then the above probability can be made arbitrarily small by 
taking a — a. This means that Pz(Ta < Tp) = 0, and the boundary a is not 
attained before b for any b. If S(a) > —oo, then Pz(Ta < Ty) > 0. A result 
similar to Theorem 6.28 holds. 


Theorem 6.29 Let Lı = f? exp (- 2 =H ds du. If Lı = œ then the dif- 


fusion attains the point b before a, for any initial point x € (a,b). If Lı < «, 
b x s s 
then let Lə = ae FT ie exp (- Ji 2112} ds) exp Ce =H) d s) dy. 


1. If Lə < œ then for all x € (a,b) the diffusion exits (a,b) in finite time, 
moreover Pz(Ty < oo) > 0. 


2. If Lz = œ then either, the exit time of (a,b) is infinite and limy5. X (t) = 
a, or the exit time of (a,b) is finite and Pz (Th < Ty) = 1. 


Example 6.14: Consider a diffusion given by the SDE 


dX(t) = ndt + 2\/X(t)dB(t (6.64) 


where n is a positive integer. It will be seen later that X(t) is the squared distance 
from the origin of the Brownian motion in n dimensions, see (6.82) Section 6.10. If 
To is the first visit of zero, we show that if n > 2, then Pz(To = co) = 1. This 
means that X(t) never visits zero, that is P(X(¢) > 0, for allt > 0) = 1. But for 
n = 1 Pz(To < co) = 1. For n = 2 the scale function S(x) is given by S(x) = ln z, 
so that for any b > 0 Pi(To < Th) = (S(1) — S(b))/(S(0) — S(b)) = 0, hence 
Pi(To < œ) = 0. For n > 3, the scale function S(#) = (1 — 2~"/?+1)/(1 — n/2). 
Therefore Pı (To < oo) = (S(1) — S(co))/(S(0) — S(co)) = 0. Thus for any n > 2, 
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Pı(To = œ) = 1. Directly, a = 0, by the above theorem Lı = oo, and the result 
follows. When n = 1 calculations show that Lı < oo and also Lg < ov, thus 
Pi(To < oo) > 0. 


Remark 6.5: There is a classification of eee points ee on 
the constants Lı, Lz and L3, where L3 = f- SG eP (2 2 2412} ds) dy. The 
boundary a is called 


1. natural, if Lı = co; 
2. attracting, if Ly < œ, L2 =o 
3. absorbing, if Lı < œ, Le < œ, L3 = œ 
4. regular, if Lı < œœ, Lz < œ, L3 < œ. 
See for example, Gihman and Skorohod (1972), p.165. 


6.9 Stationary Distributions 


Consider the diffusion process given by the SDE 
dX(t) = u(X(t))dt + o(X(1))dB(O), 


with X (0) having a distribution v(x) = P(X (0) < x). The distribution v(x) is 
called stationary or invariant for the diffusion process X (t) if for any t the dis- 
tribution of X(t) is the same as v(x). If P(t, x,y) denotes the transition proba- 
bility function of the process X (t), that is, P(t, x,y) = P(X (t) < y| X (0) = 2), 
then then an invariant v(x) satisfies 


= [Pe x, y)dv(z). (6.65) 


To justify (6.65) use the total probability formula and the fact that the sta- 
tionary distribution is the distribution of X(t) for all ¢, 


P(Xo <p) =P(X, Sy) = [P(X < ylXo = a)dvla). (6.66) 


If the stationary distribution has a density, n(x) = dv(x)/dx, then m(x) is 
called a stationary or invariant density. If p(t, x,y) = OP(t,x,y)/Oy denotes 
the density of P(t, x,y), then a stationary 7 satisfies 


m(y) = | vlt.2-u)n(o)de. (6.67) 


Under appropriate conditions on the coefficients (u and o are twice continu- 
ously differentiable with second derivatives satisfying a Hölder condition) an 
invariant density exists if and only if the following two conditions hold 
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1. JER exp ( — F a ds) dz = Wee exp (- Sa wl) ds ) da = 00 = 


2. fe ais exp (J i ds) de < o0. 


Furthermore, if an invariant density is twice continuously differentiable, then 
it satisfies the ordinary differential equation 


L*x = 0, that is 12 (0%(y)x) — £ (uly) =0 (6.68) 
, ” 2 Oy? Oy 

Moreover, any solution of this equation with finite integral defines an invari- 

ant probability density. For rigorous proof see for example, Pinsky (1995), 

p.219 and p.181. To justify equation (6.68) heuristically, recall that under ap- 

propriate conditions the density of X(t) satisfies the forward (Fokker-Plank) 

equation (5.62). If the system is in a stationary regime, its distribution does 

not change with time, which means that the derivative of the density with 
respect to t is zero, resulting in equation (6.68). 

Equation (6.68) can be solved, as it can be reduced to a first order differ- 


ential equation (see (1.34)). Using the integrating factor exp ( -f 20 dy), 
we find that the solution is given by 


~ © oof few 
me) = ael e) oo 
where C is found from f m(a)dx = 1. 


Example 6.15: For Brownian motion condition 1. above is true, but condition 2. 
fails. Thus no stationary distribution exists. The forward equation for the invariant 
distribution is 


which has for its solutions linear functions of x and none of these has a finite integral. 
Example 6.16: The forward equation for the Ornstein-Uhlenbeck process is 


_ r* a > Op ð 
a P= 5° Qn2 egg eP) 


The solution is given by 


C 72a C a 
T(x) = zzp (J 3 dy) = oy exp ( — Za), (6.70) 
0 


This shows that if a is negative, no stationary distribution exists, and if a is positive 
then the stationary density is Normal N(0,07/(2a)). The fact that N(0,0?/(2a)) is 
a stationary distribution can be easily verified directly from representation (5.13). 
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Remark 6.6: The Ornstein-Uhlenbeck process has the following properties: 
it is a Gaussian process with continuous paths, it is Markov, and it is station- 
ary, provided the initial distribution is the stationary distribution N(0, ey, 
Stationarity means that finite-dimensional distributions do not change with 
shift in time. For Gaussian processes stationarity is equivalent to the covari- 
ance function to be a function of |t — s| only, i.e. Cov( X(t), X(s)) = h(|t — s|) 
(see Exercise (6.3)). The Ornstein-Uhlenbeck process is the only process that 
is simultaneously Gaussian, Markov and stationary (see for example Breiman 


(1968), p.350). 


Invariant Measures 


A measure v is called invariant for X(t) if it satisfies the equation 

v(B) = f°. P(t, x, B)dv(x) for all intervals B. In equation (6.65) intervals of 
the form B = (—co,y] were used. The general equation reduces to (6.65) if 
C = v(R) < œ. In this case v can be normalized to a probability distribution. 
If C = œ this is impossible. Densities of invariant measures, when they 
exist and are smooth enough, satisfy equation (6.67). Conversely, any positive 
solution to (6.67) is a density of an invariant measure. 

For Brownian motion m(x) = 1 is a solution of the equation (6.67). This 
is seen as follows. Since p(t,x,y) is the density of the N(x,t) distribution, 
p(t, x,y) = = exp((y — x)? /(2t)). Note that for a fixed y, as a function of 
x, it is also the density of the N(y,t) distribution. Therefore it integrates to 
unity, IR p(t, x, y)dz = 1. Thus r(x) = 1 is a positive solution of the equation 
(6.67). In this case the density 1 corresponds to the Lebesgue measure, which 
is an invariant measure for Brownian motion. Since f pR tdr = œ, it can not 
be normalized to a probability density. Note also that since the mean of the 


N (y, t) distribution is y, we have 


/ rp(t, x, y)dx = y. 


—Co 


So that m(x) = x is also a solution of the equation (6.67), but it is not a 
positive solution, and therefore is not a density of an invariant measure. 

An interpretation of the invariant measure which is not a probability mea- 
sure may be given by the density of a large number (infinite number) of par- 
ticles with locations corresponding to the invariant measure, all diffusing ac- 
cording to the diffusion equation. Then at any time the density of the particles 
at any location will be preserved. 
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6.10 Multi-dimensional SDEs 


We cover the concepts very briefly, relying on analogy with the one-dimensional 
case, but state the differences arising due to the increase in dimension. Let 
X(t) be a diffusion in n dimensions, described by the multi-dimensional SDE 


dX(t) = b( X(t), t)dt + o( X(t), t)dB(t), (6.71) 


where o is n x d matrix valued function, B is d-dimensional Brownian motion, 
see section 4.7, X,b are n-dimensional vector valued functions. In coordinate 
form this reads 


d 
dX;(t) = bi(X (t), t)dt + oe oi;(X(t),t)dB,(t), i=1,...,n, (6.72) 
and it means that for allt > 0 andi=1,...,n 
t 
X;(t) = X;(0) +f bi( X Daa f dij(X (u), u)dB;(u). (6.73) 
0 


The coefficients of the SDE are: the vector b(x,t) and the matrix o(a,t). 

An existence and uniqueness result for strong solutions, under the assump- 
tion of locally Lipschitz coefficients holds in the same form, see Theorem 5.4, 
except for absolute values that should be replaced by the norms. The norm 
of the vector is its length, |b] = yX ;—1 b7. The norm of the matrix ø is 
defined by |o|? = trace(oo7"), with oT” being the transposed of øo. The 
trace(a) = X; ai. The matrix a = øo?” is called the diffusion matrix. 


Theorem 6.30 If the coefficients are locally Lipschitz in x with a constant 
independent of t, that is, for every N, there is a constant K depending only 
on T and N such that for all |a|,|y| < N and all0 <t<T 


|b(a, t) — B(y, t)| + lo(w, t) — oly, t)| < Ka — yl, (6.74) 


then for any given X(0) the strong solution to SDE (6.71) is unique. If in 
addition to condition (6.74) the linear growth condition holds 


|b(a, t)| + |o(w, t)| < Kr(1 + |a}), 


X(0) is independent of B, and E|X(0)|? < œ, then the strong solution exists 
and is unique on (0, T], moreover 


B( sup x0) < C(1+E|X(0)/’), (6.75) 
0<t<T 


where constant C depends only on K and T. 
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Note that unlike in the one-dimensional case, the Lipschitz condition on ø can 
not be weakened in general to a Holder condition, i.e. there is no Yamada- 
Watanabe-type result for multi-dimensional SDEs. 

The quadratic covariation is easy to work out from (6.72), by taking into 
account that independent Brownian motions have zero quadratic covariation. 


d[.X;, X; (t) = dX;(t)dX,(t) = aj; (X (t), t)dt. (6.76) 
It can be shown that if X(t) is a solution to (6.71) then 
B(Xi(t+A)—2|XQ) =e) = bi(w,t)A+o(A) 
B(X +A) —2)(Xj(t+A)— 2) |X() =m) = ay(w,t)A+o(A), 
as A — 0. Thus b(æ,t) is the coefficient in the infinitesimal mean of the 
displacement from point x at time t, and a(æ, t) is approximately the coefficient 
in the infinitesimal covariance of the displacement. 


Weak solutions can be defined as solutions to the martingale problem. Let 
the operator L+, acting on twice continuously differentiable functions from IR” 


to R, be 
Li= X bila, t)a— ts sE Lal (x,t b (6.77) 
= a j 


Note that L; depends on o only through a. Then X(t) is a weak solution 
started at x at time s, if 


AXW) - / (Lu f)(X(u))du (6.78) 


is a martingale for any twice continuously differentiable function f vanish- 
ing outside a compact set in IR”. This process is called a diffusion with 
generator L. In the case of time-independent coefficients, the process is a 
time-homogeneous diffusion with generator L. 


Theorem 6.31 Assume that a(x,t) is continuous and satisfies condition (A) 
(A) 5 aij(æ, t)vivj > 0, for alla € R” and v #0 
ij=l 


and b(x,t) is bounded on bounded sets. Then there exists a unique weak solu- 
tion up to the time of explosion. If, in addition, the linear growth condition is 
satisfied, that is, for any T > 0 there is Kr such that for alla € R 


|b(a, t)| + |a(w,t)| < Kr(1 + |e), (6.79) 
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then there exists a unique weak solution to the martingale problem (6.78) start- 
ing at any point x € R. at any time s > 0, moreover this solution has the strong 
Markov property. 


Since the weak solution is defined in terms of the generator, which itself de- 
pends on ø only through a, the weak solution to (6.71) can be constructed 
using a single Brownian motion provided the matrix a remains the same. If 
a single SDE is equivalent to a number of SDEs, heuristically, it means that 
there is as much randomness in a d-dimensional Brownian motion as there is 
in a single Brownian motion. Replacement of a system of SDEs by a single 
one is shown in detail for the Bessel process. 

Note that the equation øg?” = a has many solutions for ø, the matrix 
square root is non-unique. However, if a(x, t) is non-negative definite for all x 
and t, and has for entries twice continuously differentiable functions of æ and 
t, then it has a locally Lipschitz square root o(a,t) of the same dimension as 
a(x,t) (see for example Friedman (1975) Theorem 6.1.2). 


Bessel Process 


Let B(t) = (Bi (t), Bo(t),..., Ba(t)) be the d-dimensional Brownian motion, 
d > 2. Denote by R(t) its squared distance from the origin, that is, 


d 
=o): (6.80) 
The SDE for R(t) is given by (using d(B?(t)) = 2B(t)dB(t) + dt) 
d 
dR(t) = d dt+25~ B,(t)dBi(t). (6.81) 


In this case we have one equation driven by d independent Brownian motions. 
Clearly, b(x) = d, o(a) is (1 x d) matrix 2(B,(t), Bo(t),..., Ba(t)), so that 
a(X(t)) = o(X(t))o7" (X(t) = 15i- IBM )= 4R(t ) is a scalar. Thus the 
generator of X(t) is given by L = d£ + L(4a)- £. But the same generator 
corresponds to the process X (t) aa E the SDE below driven by a single 
Brownian motion 

dX(t) =d dt +24 X(t)dB(t (6.82) 
Therefore the squared ae process R(t in ne satisfies SDE (6.82). 
This SDE was considered in Example 6.14. The Bessel process is defined as 


the distance from the origin, Z(t) = See , B?(t) = y R(t). Since R(t) has 


the same distribution as X(t) given by (6.82), by ie ô’s a Z(t) ue 


= saa" + dB(t). (6.83) 
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Using the one-dimensional SDE (6.82) for R(t) we can decide on the re- 
currence, transience and attainability of 0 of Brownian motion in dimensions 
2 and higher. It follows from Example 6.14 that in one and two dimensions 
Brownian motion is recurrent, but in dimensions three and higher it is tran- 
sient. It was also shown there that in dimension two and above Brownian 
motion never visits zero. See also Karatzas and Shreve (1988), p.161-163. 


It6’s Formula, Dynkin’s Formula 


Let X(t) = (Xi (t),...,Xn(t)) be a diffusion in R” with generator L, (the 
general case is similar, but in what follows time-homogeneous case will be 
considered). Let f : R” — R be a twice continuously differentiable (C?) 
function. Then It6’s formula states that 


“0 a? f 
df (X(t)) = — (X (t)) dx; ( ij X (t))dt. 
FAO) = Lo g XO) LD wm and XO) 
It can be regarded as a Taylor’s formula expansion where 


Itô’s formula can be written with the help of the generator as 


n d 
EOE SOES DIDE AE 


The analogues of Theorems 6.3 and 6.15 hold. It is clear from the above (6.85) 
that if partial derivatives of f are bounded, and o(a) is bounded, then 


t))oig(X(1))dB;(t). (6.85) 


t 
FXO) | LEX W)du (6.86) 
0 
is a martingale. (Without the assumption of functions being bounded , it is a 
local martingale). 


Theorem 6.32 Suppose that the assumptions of Theorem 6.31 hold. Let D C 
R” be a bounded domain (an open and simply connected set) in R”. Let 
X (0) = x and denote by T the exit time from D, Tp = inf{t > 0: X(t) € OD}. 
Then for any twice continuously differentiable f, 


IXEN) | LEXW) (6.87) 


is a martingale. 
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It can be shown that under conditions of the above theorem 


sup Ex (Tp) < œ. (6.88) 
LED 


As a corollary the following is obtained 


Theorem 6.33 Suppose that the assumptions of Theorem 6.31 hold. If f is 
twice continuously differentiable in D, continuous on OD, and solves 


Lf =-—¢ in D and f=g on D. (6.89) 


for some bounded functions g and ¢. Then f(x), x € D, has representation 


f(x) = Ex (9(X (TD))) +Ex( | o(X(s))ds). (6.90) 
In particular if o = 0, solution has representation as 


f(a) = Ea (g(X(rp))). 
The proof is exactly the same as for Theorem 6.21 in one dimension. 


Definition 6.34 A function f(x) is said to be L-harmonic on D if it is twice 
continuously differentiable on D and Lf(x) =0 for a € D. 


The following result follows from (6.86). 


Corollary 6.35 For any bounded L-harmonic function on D with bounded 
derivatives, f(X(tATp)) is a martingale. 


Example 6.17: Denote by A = 4 ‘an x the three-dimensional Laplacian. This 
operator is the generator of three-dimensional Brownian motion Bit), L =A. Let 
D = {zx : |x| >r}. Then f(a) = 1/|x| is harmonic on D, that is Lf(æ) = 0 for all 
x € D. To see this perform differentiation and verify that ae i oot (Gore) = 
0 at any point x # 0. Note that in one dimension all harmonic functions for the 
Laplacian are linear functions, whereas in higher dimensions there are many more. 
It is easy to see that 1/|a| and its derivatives are bounded on D, consequently if 
B(0) = x £0, then 1/|B(tA Tp)| is a martingale. 


The Backward (Kolmogorov’s) equation in higher dimensions is the same as in 
one dimension, with the obvious replacement of the state variable x € R”. We 
have seen that solutions to the backward equation can be expressed by means of 
diffusion, Theorem 6.33. However, it is a formidable task to prove that if X(t) 
is a diffusion, and g(a) is a smooth function on D then f(x) = Ex (gX (t))) 
solves the backward equation (see for example, Friedman (1975)). 
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Theorem 6.36 Let g(a) be a function with two continuous derivatives satis- 
fying a polynomial growth condition, that is, the function and its derivatives 
in absolute value do not exceed K(1+|ax|™) for some constants K,m > 0. Let 
X(t) satisfy (6.71). Assume that coefficients b(a,t), o(a,t) are Lipschitz in 
x uniformly in t, satisfy the linear growth condition, and their two derivatives 
satisfy a polynomial growth condition. Let 


f (x,t) = Ex (g(X(T))|X(t) = x) (6.91) 


Then f has continuous derivatives in x, which can be computed by differentiat- 
ing (6.91) under the expectation sign. Moreover f has a continuous derivative 
in t, and solves the backward PDE 


ied = 0, in R” x [0,T) 
f(@z,T) — g(a), astÎT. (6.92) 


The fundamental solution of (6.92) gives the transition probability function of 
the diffusion (6.71). 


Remark 6.7: (Diffusions on manifolds) 

The PDEs above can also be considered when the state variable x belongs to 
a manifold, rather than R”. The fundamental solution then corresponds to 
the diffusion on the manifold and represents the way heat propagates on that 
manifold. It turns out that various geometric properties of the manifold can be 
obtained from the properties of the fundamental solution, Molchanov (1975). 


Remark 6.8: The Feynman-Kac formula holds also in the multi-dimensional 
case in the same way as in one dimension. If 0 in the right hand side of the 
PDE (6.92) is replaced by rf, for a bounded function r, then the solution 
satisfying a given boundary condition f(x, T) = g(a) has a representation 


flat) =B (oF EATX =a). 
See Karatzas and Shreve (1988). 


Recurrence, Transience and Stationary Distributions 


Properties of recurrence and transience of multi-dimensional diffusions, solu- 
tions to (6.71), are defined similarly to the one-dimensional case. However, 
in higher dimensions a diffusion X (t) is recurrent if for any starting point 
x € R” the process will visit a ball around any point y € R of radius e, 
D.(y), however small, with probability one. 


Pe (X(t) € De(y) for a sequence of ts increasing to infinity ) = 1. 
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A diffusion X(t) on R” is transient if for any starting point a € R” the 
process will leave any ball, however large, never to return. It follows by a 
diffusion analysis of the squared lengths of the multi-dimensional Brownian 
motion (see Example 6.14) that in dimensions one and two Brownian motion 
is recurrent, but it is transient in dimensions three and higher. 

For time-homogeneous diffusions under conditions of Theorem 6.31 on the 
coefficients, recurrence is equivalent to the property that the process started 
at any point x hits the closed ball D.(y) around any point y in finite time. 
Under these conditions, there is a dichotomy, a diffusion is either transient or 
recurrent. Invariant measures are defined in exactly the same way as in one 
dimension. Stationary distributions are finite invariant measures; they may 
exist only if a diffusion is recurrent. A diffusion is recurrent and admits a 
stationary distribution if and only if the expected hitting time of D.(y) from 
x is finite. When this property holds diffusion is also called ergodic or positive 
recurrent. 

In general there are no necessary and sufficient conditions for recurrence 
and ergodicity for multi-dimensional diffusions, however there are various tests 
for these properties. The method of Lyapunov functions, developed by R.Z. 
Khasminskii, consists of finding a suitable function f, such that Lf < 0 outside 
a ball around zero. If limjyzj.. f(x) = oo, then the process is transient. If 
f is ultimately decreasing, then the process is recurrent. If Lf < —e for 
some € > 0 outside a ball around zero, with f(x) bounded from below in that 
domain, then a diffusion is positive recurrent. Proofs consist of an application 
of Itô’s formula coupled with the martingale theory (convergence property of 
supermartingales). See for details Bhattacharya (1978), Hasminskii (1980), 
Pinsky (1995). 


Higher Order Random Differential Equations 


Similarly to ODEs higher order random differential equations have interpreta- 
tions as multi-dimensional SDEs. For example, a second order random differ- 
ential equation of the form 

ë+ h(x, i) = B, (6.93) 
where «(t) = dx(t)/dt, %(t) = d?a(t)/dt?, and B denotes the White noise, has 
interpretation as the following two-dimensional SDE by letting 


a(t) = x(t), (6.94) 
dX,(t) = Xo(t)dt, (6.96) 


dX(t) = —h(Xj(t), Xo(t)) + dB(t). (6.97) 
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Such equations are considered in Section 14.2 of Chapter 14. 

Higher n-th order random equations are interpreted in a similar way: by let- 
ting Xı(t) = X(t) and dX;(t) = Xi4i(t)dt, i = 1,...,n — 1, an n-dimensional 
SDE is obtained. 


Notes. Most of the material can be found in Friedman (1975), Gihman and 
Skorohod (1982), Stroock and Varadhan (1979), Karatzas and Shreve (1988), 
Rogers and Williams (1990), Pinsky (1995). 


6.11 Exercises 


Exercise 6.1: Show that for any u, f(x,t) = exp(ux — u?t/2) solves the 
backward equation for Brownian motion. Take derivatives, first, second, etc., 
of exp(ux —u7t/2) with respect to u, and set u = 0, to obtain that functions z, 
x? —t, x? — 3ta, xt — 6tx? + 3t?, etc. also solve the backward equation (6.13). 
Deduce that B?(t) — t, B(t)? — 3tB(t), B4*(t) — 6tB?(t) + 3t? are martingales. 
Exercise 6.2: Find the generator for the Ornstein-Uhlenbeck process, write 


the backward equation and give its fundamental solution. Verify that it satis- 
fies the forward equation. 


Exercise 6.3: Let X(t) be a stationary process. Show that the covariance 
function 7(s,t) = Cov(X(s), X(t)) is a function of |t — s| only. Hint: take 
k = 2. Deduce that for Gaussian processes stationarity is equivalent to the 
requirements that the mean function is a constant and the covariance function 
is a function of |t — s|. 

Exercise 6.4: X(t) is a diffusion with coefficients u(x) = cx and o(x) = 1. 
Give its generator and show that X2(t) — 2c IM X?(s)ds — t is a martingale. 
Exercise 6.5: X(t) is a diffusion with u(x) = 2x and g?(x) = 4x. Give its 
generator L. Solve Lf = 0, and give a martingale Mp. Find the SDE for the 
process Y(t) = ,/X(t), and give the generator of Y (t). 


Exercise 6.6: Find f(x) such that f(B(t) + t) is a martingale. 


Exercise 6.7: X(t) is a diffusion with coefficients u(x,t),o(x,t). Find a 
differential equation for f (x,t) such that Y(t) = f(X(#),#) has infinitesimal 
diffusion coefficient equal to 1. 


Exercise 6.8: Show that the mean exit time of a diffusion from an interval, 
which (by Theorem 6.16) satisfies the ODE (6.44) is given by 


2a : ¥ ds £ id ds J? G(s)ds 
o=- free [array SO) o EETA 
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T 2u(s 
where G(x) = exp ( — J- a ds). 
Exercise 6.9: Find P}(Te < Ta) for Brownian motion with drift when 
u(x) = u and o?°(x) = o°. 


Exercise 6.10: Give a probabilistic representation of the solution f(x,t) of 


the PDE Pf af 

1 

~~ += =0, 0<t<T T) = 7°. 

zga t op 7 OSST Fe T)=2 
Solve this PDE using the solution of the corresponding stochastic differential 
equation. 


Exercise 6.11: Show that the solution of the following ordinary differential 
equation dz(t) = cx” (t)dt, c > 0, «(0) = zo > 0, explodes if and only if r > 1. 


Exercise 6.12: Investigate for explosions the following process 
dX(t) = X°(t)dt + oX*(t)dB(t). 


Exercise 6.13: Show that Brownian motion B(t) is recurrent. Show that 
B(t) + t is transient. 


Exercise 6.14: Show that the Ornstein-Uhlenbeck process is positively recur- 
rent. Show that the limiting distribution for the Ornstein-Uhlenbeck process 
(5.6) exists, and is given by its stationary distribution. Hint: the distribution 
of ce ies e**dB, is Normal, find its mean and variance, and take limits. 


Exercise 6.15: Show that the square of the Bessel process X(t) in (6.64) 
comes arbitrarily close to zero when n = 2, that is, P(T} < oo) = 1 for any 
small y > 0, but when n > 3, P(T} < co) <1. 


Exercise 6.16: Let diffusion X(t) have o(x) = 1, u(x) = —1 for x < 0, 
u(x) = 1 for z > 0 and u(0) = 0. Show that r(x) = e7!*! is a stationary 
distribution for X. 


Exercise 6.17: Let diffusion on (a, 3) be such that the transition probability 
density p(t, x, y) is symmetric in x and y, p(t, x,y) = p(t, y, x) for all x, y and t. 
Show that if (a, 8) is a finite interval, then the uniform distribution is invariant 
for the process X(t). 


Exercise 6.18: Investigate for absorption at zero the following process (used 
as a model for interst rates, the square root model of Cox, Ingersoll and Ross). 
dX(t) = b(a — X(t))dt + o y X (t)dB(t), where parameters b,a and o are con- 
stants. 
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Chapter 7 


Martingales 


Martingales play a central role in the modern theory of stochastic processes 
and stochastic calculus. Martingales constructed from a Brownian motion 
were considered in Section 3.3 and martingales arising in diffusions in Section 
6.1. Martingales have a constant expectation, which remains the same under 
random stopping. Martingales converge almost surely. Stochastic integrals are 
martingales. These are the most important properties of martingales, which 
hold under some conditions. 


7.1 Definitions 


The main ingredient in the definition of a martingale is the concept of condi- 
tional expectation, consult Chapter 2 for its definition and properties. 


Definition 7.1 A stochastic process M(t), where time t is continuous 0 < t < 
T, or discrete t = 0,1,...,T, adapted to a filtration F = (F;) is a martingale 
if for any t, M(t) is integrable, that is, E|M(t)| < co and for any t and s with 
O<s<t<T, 

E(M(t)|Fs) = M(s) a.s. (7.1) 


M(t) is a martingale on (0,00) if it is integrable and the martingale property 
(7.1) holds for any 0 < s < t < œ. 


Definition 7.2 A stochastic process X(t), t > 0 adapted to a filtration F 
is a supermartingale (submartingale) if it is integrable, and for any t and s, 
O<s<t<T 


E(X(t)|Fs) < X(s), (E(X(t)|Fs) = X(s)) a.s. 
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If X(t) is a supermartingale, then —X (t) is a submartingale. The mean 
of a supermartingale is non-increasing with t, the mean of a submartingale 
is non-decreasing in t, and the mean of a martingale is constant in t. This 
property is used in a test for a super(sub)martingale to be a true martingale. 


Theorem 7.3 A supermartingale M(t), 0 < t < T, is a martingale if and 
only if EM(T) = EM(0). 


Proor: If M is a martingale, then EM(T) = EM(0) follows by the mar- 
tingale property with s = 0 and t = T. Conversely, suppose M(t) is a su- 
permartingale and EM(T) = EM(0). If for some t and s we have a strict 
inequality, E(M(t)|F;) < M(s) on a set of positive probability, then by taking 
expectations, we obtain EM(t) < EM(s). Since the expectation of a super- 
martingale is non-increasing, EM(T) < EM(t) < EM(s) < EM(0). But this 
contradicts the condition of the theorem EM(T) = EM(0). Thus for all t and 
s the inequality E(M(t)| Fs) < M(s) must be an equality almost surely. 


We refer to Theorem 2.32 on the existence of the regular right-continuous 
version for supermartingales. Regular right-continuous versions of processes 
will be taken. 


Square Integrable Martingales 


A special role in the theory of integration is played by square integrable mar- 
tingales. 


Definition 7.4 A random variable X is square integrable if E(X?) < œ. A 
process X(t) on the time interval [0,T], where T can be infinite, is square in- 
tegrable if suPicjo,r] EX? (t) < 00 (supys9 EX? (t) < 00), i.e. second moments 
are bounded. 


Example 7.1: 

1. Brownian motion B(t) on a finite time interval 0 < t < T is a square integrable 
martingale, since EB? (t) = t < T < oo. Similarly, B?(t) — t is a square 
integrable martingale. They are not square integrable when T = oo. 

2. If f(x) is bounded and continuous function on R, then Itô integrals 
J f(B(s))dB(s) and J f(s)dB(s) are square integrable martingales on any 
finite time interval 0 < t < T. Indeed, by (4.7), an Itô integral is a martingale, 
and since |f(x)| < K, 


E ([ f(B(6)}4B\6)) =E Ge (Bt) < K*t < K°T < oœ. 


If moreover, ne f?(s)ds < oo then J f(s)dB(s) is a square integrable mar- 
tingale on [0, oo). 
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7.2 Uniform Integrability 


To appreciate the definition of uniform integrability of a process, recall what 
is meant by integrability of a random variable X. It is called integrable if 
E|X| < co. It is easy to see that this holds if and only if 


lim E(|X|I(|X| > n)) =0. (7.2) 
Indeed, if X is integrable then (7.2) holds by the dominated convergence, since 


limno | X|I(|X| > n) = 0 and |X|I(|X| > n) < |X|. Conversely, let n be 
large enough for the rhs in (7.2) to be finite. Then 


E|X| = E(|X|I(X| > n)) + E(IXII(IX| < n)) < œ, 
since the first term is finite by (7.2) and the second is bounded by n. 


Definition 7.5 A process X(t), 0 < t < T is called uniformly integrable if 
E(|X(t)|L(|X(t)| > n)) converges to zero as n — œ uniformly in t, that is, 


lim sup E(|X (t)|Z(|X(t)| >n)) =0, (7.3) 
noo t 
where the supremum is over [0,T] in the case of a finite time interval and 
(0, co) if the process is considered on 0 < t < co. 


Example 7.2: We show that if X(t), 0 < t < T is uniformly integrable, then it is 
integrable, that is, sup, E|X(t)| < co. Indeed, 


sup E|X(t)| < sup E(|X(#)|L(|X()| > n)) +n. 


Since X(t) is uniformly integrable, the first term converges to zero as n — oo, in 
particular it is bounded, and the result follows. 


Sufficient conditions for uniform integrability are given next. 


Theorem 7.6 If the process X is dominated by an integrable random variable, 
|X(t)| < Y and E(Y) < on, then it is uniformly integrable. In particular, if 
E(sup, |X (t)|) < œ, then it is uniformly integrable. 


PROOF: E(|X(t)|I([X(t)| > n)) < E(|Y|Z(IY| > n)) > 0, as n — o. 


Note that there are uniformly integrable processes (martingales) which are not 
dominated by an integrable random variable, so that the sufficient condition 
for uniform integrability E(sup, |X (¢)|) < oo is not necessary for uniform in- 
tegrability. Another sufficient condition for uniform integrability is given by 
the following result, see for example Protter (1992), p.9, Liptser and Shiryaev 
(2001), p. 17. 
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Theorem 7.7 If for some positive, increasing, convex function G(x) on [0, 00) 
such that limg+oo G(x) /x = œ, 


sup B(G(|X(0)))) < 00, (7.4) 


then X(t), t <T is uniformly integrable. 


We omit the proof. In practice the above result is used with G(x) = x” for 
r > 1, and uniform integrability is checked by using moments. For second 
moments r = 2, we have: square integrability implies uniform integrability. 


Corollary 7.8 If X(t) is square integrable, that is, sup,EX?(t) < œœ, then it 
is uniformly integrable. 


In view of this, examples of uniformly integrable martingales are provided 
by square integrable martingales given in Example 7.1. The following result 
provides a construction of uniformly integrable martingales. 


Theorem 7.9 (Doob’s, Levy’s martingale) Let Y be an integrable ran- 
dom variable, that is, E|Y | < oo and define 


M(t) = E(Y |F;). (7.5) 
Then M(t) is a uniformly integrable martingale. 


PROOF: It is easy to see that M(t) is a martingale. Indeed, by the law 
of double expectation, E(M(t)|Fs) = E(E(Y|Fi)|Fs) = E(Y |F.) = M(s). 
The proof of uniform integrability is more involved. It is enough to es- 
tablish the result for Y > 0 as the general case will follow by consider- 
ing Yt and Y`. If Y > 0 then M(t) > 0 for all t. We show next that 
M* = sup,ep M(t) < oo. If not, there is a sequence of tn 7 oo such that 
M(tn) T co. By monotone convergence, EM (tn) T 00, which is a contradic- 
tion, as EM(t,) = EY < oo. Now, by the general definition of conditional 
expectation, see (2.16), E(M(t)I(M(t) > n)) = E(YJ(M(t) > n)). Since 
{M(t) > n} C {M* > n}, E(YI(M(t) > n)) < E(YI(M* > n)). Thus 
E(M(t)I(M(t) > n)) < E(YI(M* > n)). Since the right-hand side does 
not depend on t, sup;<r E(M(t)I(M(t) > n)) < E(YI(M* > n)). But this 
converges to zero as n — oo, because M™ is finite and Y is integrable. 


The martingale in (7.5) is said to be closed by Y. An immediate corollary is 


Corollary 7.10 Any martingale M(t) on a finite time interval 0 <t<T < 
oo is uniformly integrable and is closed by M(T). 


It will be seen in the next section that a uniformly integrable martingale on 
(0, co) is also of the form (7.5). That is, there exists a random variable, called 
M (co) such that the martingale property holds for all 0 < s < t, including 
t= œ. 
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7.3 Martingale Convergence 
In this section martingales on the infinite time interval [0, oo) are considered. 


Theorem 7.11 (Martingale Convergence Theorem) If M(t), 0 <t < 
oo, is an integrable martingale (supermartingale or submartingale), that is, if 
sup;>o E|M(t)| < 00, then there exists an almost sure limit lim~o M(t) = Y 
and Y is an integrable random variable. 


The proof of this result is due to Doob and it is too involved to be given here. 
If M(t) is a martingale, then the condition sup;>o E|M(t)| < œ is equivalent 
to any of the following conditions: 


e limt—o. E|M(t)| < co. This is because |z| is a convex function, implying 
that |M (t)| is a submartingale, and expectation of a submartingale is an 
increasing function of t. Hence the supremum is the same as the limit. 


e lim; EM+(t) < oo. This is because E|M(t)| = EM*(t) + EM (t). 
If EM(t) = c, then EM(t) = EM*(t) — EM- (t) = c and EMT (t) = 
EM- (t) +c. 


e lim; ~o EMT (t) < co. 


If M(t) is a submartingale, it is enough to demand sup, EM+ (t) < oo, and 
if it is a supermartingale it is enough to demand sup, EMT (t) < ov, for the 
existence of a finite limit. 


Corollary 7.12 

1. Uniformly integrable martingales converge almost surely. 
. Square integrable martingales converge almost surely. 
. Positive martingales converge almost surely. 


. Submartingales bounded from above (negative) converge almost surely. 


a Bw LG S 


. Supermartingales bounded from below (positive) converge almost surely. 


PROOF: Since uniformly integrable martingales are integrable, they converge. 
Since square integrable martingales are uniformly integrable, they converge. 
If M(t) is positive then |M(t)| = M(t), and E|M(t)| = EM(t) = EM(0) < œ. 


Note that expectations EM(t) may or may not converge to the expectation 
of the limit EY (see Example 7.3). The case when EY = lim:+. EM (t), is 
precisely when M(t) is uniformly integrable. The next result, given without 
proof, shows that uniformly integrable martingales have form (7.5). 
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Theorem 7.13 If M(t) is a uniformly integrable martingale then it converges 
to a random variable Y almost surely and in L+. Conversely, if M(t) is a mar- 
tingale that converges in L to a random variable Y, then M(t) is uniformly 
integrable, and it converges almost surely to Y. In any case M(t) = E(Y|F,). 


Example 7.3: (Exponential martingale of Brownian motion.) 

Let M(t) = eP®-t/2. Then M(t), t > 0 is a martingale. Since it is positive, 
it converges by Corollary 7.12 almost surely to a limit Y. By the Law of Large 
Numbers for Brownian motion B(t)/t converges almost surely to zero. Thus 

M(t) = e'BO/t-1/2) — 0 as t > œ. Thus Y = 0 almost surely. Therefore M(t) is 
not uniformly integrable, as EY = 041=EM(t). 


Example 7.4: Let f(s) be non-random, such that des f?(s)ds < 00. We show that 


M(t) = io Bi (s)dB(s) is a uniformly integrable martingale and find a representation 
for the oat, random gar 


Since KF s)ds < œ, SiE s)dB(s A is defined at all t > 0 and is a martingale. 
Since sup; ae (t)) = supisg EF s)ds = Je T s)ds < oo, M(t) is uniformly 
integrable. Thus it converges almost nee to Y. omer i is ae in L}, that 
is E| M(t) — Y| > i as t — oo. a Y = M(co =h f , then we have 
shown that Y — M(t =f f(s ) converges 2 Zero R m and in Lt. Y 
is the closing suia Indeed, 

E(Y | F+) = E(M(co)|Fi) = ECSS f s)| F) = SE = M(t). 


oe f F 7 sae positive martingale M(t) = E(I(Y > 0)| F+) with 
a ae ), where f(s) is non-random and Ae f?(s)ds < oo, from the 
previous cds 


M(t) I(Y > 0)| Fe) = P(Y > 0|F:) 
E f(s)dB(s > fii f(s)dB(s |) 
_ 9 { fab) (7.6) 


af f° f2(s)as J 


where the last equality is due to normality of the It6 integral for a non-random f. 
By ai to be zero on (T, 00), a result is obtained for martingales of the form 


(fy f(s ) > 0)|F:). In particular, by taking f(s) = 1jo,r](s), we obtain that 


O(B(t)/VT — t) 


is a positive bounded martingale on [0, T]. Its distribution for t < T is left as an 
exercise. 
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7.4 Optional Stopping 


In this section we consider results on stopping martingales at random times. 
Recall that a random time 7 is called a stopping time if for any t > 0 the sets 
{r < t} € Fi. For filtrations generated by a process X, r is a stopping time if 
it is possible to decide whether 7 has occurred or not by observing the process 
up to time t. A martingale stopped at a random time 7 is the process M(tAr). 
A Basic Stopping result, given here without proof, states that a martingale 
stopped at a stopping time is a martingale, in particular EM (r At) = EM(0). 
This equation is used most frequently. 


Theorem 7.14 If M(t) is a martingale and T is a stopping time, then the 
stopped process M(r At) is a martingale. Moreover, 


EM(r At) =EM(0). (7.7) 


This result was proved in discrete time (see Theorem 3.39). We refer to (7.7) 
as the Basic Stopping equation. 


Remark 7.1: We stress that in this theorem M(r A t) is a martingale with 
respect to the original filtration F,. Since it is adapted to Fraz, it is also an 
Fra martingale (see Exercise 7.1). 


Example 7.6: (Exit of Brownian Motion from an Interval) 

Let B(t) be Brownian motion started at x and 7 be the first time when B(t) exits the 
interval (a,b), a < x < b, that is, 7 = inf{t: B(t) =a or b}. Clearly, 7 is a stopping 
time. By the basic stopping result (7.7), EB(t A T) = B(0) = z. By definition of 7, 
|B(tA T)| < max(|a|, |b|). Thus one can take t — oo, and use dominated convergence 
to obtain EB(r) = x. But B(T) = b with probability p and B- = a with probability 
1 — p. From these equations we obtain that p = (x — a)/(b— a) is the probability 
that Brownian motion reaches b before it reaches a. 


If M is a martingale then EM(t) = EM(0). If 7 is a stopping time, then 
EM(r) may be different to EM(0), as the next example shows. 


Example 7.7: Let B(t) be Brownian motion started at 0 and 7 is the hitting time 
of 1. Then by definition B(T) = 1 and EB(r) = 1 # 0 = EB(0). 


However, under some additional assumptions on the martingale or on the 
stopping time, the random stopping does not alter the expected value. The 
following result gives sufficient conditions for optional stopping to hold. 


Theorem 7.15 (Optional Stopping) Let M(t) be a martingale. 
1. Ifr < K < oœ is a bounded stopping time then EM(r) = EM (0). 


2. If M(t) is uniformly integrable, then for any stopping time T, 
EM(r) = EM(0). 
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The first statement follows from the Basic Stopping Result by taking t > K 
and applying (7.7), EM(0) = EM (t A T) =EM(rt). 

Applied to gambling this shows that when betting on a martingale, on 
average no loss or gain is made, even if a clever stopping rule is used, provided 
it is bounded. 

We don’t give a proof of the second statement, the difficult point is in 
showing that M(r) is integrable. 


Theorem 7.16 Let M(t) be a martingale and T a finite stopping time. If 


E|M(7)| < œ, and 
lim E(M(t)I(7 > t)) =0, (7.8) 


then EM(r) = EM(0). 
PROOF: Write M(r At) as 
M(rAt)= M(t)I(t <r) + M(r)I(t > 1). (7.9) 


Using Basic Stopping Result (7.7), EM(7 At) = EM(0). Taking expectations 
in (7.9), we have 


EM(0) = E(M(t)I(t < r)) + E(M(r)I(t > 7)). (7.10) 
Now take the limit in (7.10) as t — oo. Since 7 is finite, I(t > T) —> 
oo) = 1. |M(T)|I(t > T) < |M(z)|, integrable. Hence E(M(r)I(t > 7)) > 
t<T 


EM(r) by dominated convergence. It is assumed that E(M(t)I( 
as t — oo, and the result follows. 


The Basic Stopping result or Optional Stopping are used to find the distribu- 
tion of stopping times for Brownian motion and Random Walks. 


Example 7.8: (Hitting times of Brownian Motion) 

We derive the Laplace transform of hitting times, from which it also follows that they 
are finite. Let B(t) be a Brownian motion starting at 0, and T, = inf{t: B(t) = b}, 
b > 0. Consider the exponential martingale of Brownian motion er BO —w7t/2 u > 0, 


uB(tAT,)— (tATu? /2_ 


stopped at T,, e Using the Basic Stopping result (7.7) 


Bet BAT.) —(tAT pu? /2 — |. 


The martingale is bounded from above by e”? and it is positive. If we take it as 
already proven that T, is finite, P(Tẹ < co) = 1, then we obtain by taking t — oo 
that Eet? @AT)u?/2 — 1, Replacing u by V2u, we obtain the Laplace transform of 
Ty 

br, (u) =E (e™"™) = e., (7.11) 
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We now show the finiteness of Tẹ. Write the expectation of the stopped martingale 


E Cas (CE < ‘)) +E (P P, > ‘)) =1, (7.12) 


The term E Caen ee > ‘)) <E (ene ee! > t) < evr tu? /2 L, 0, as 
t — oo. Thus taking limits in the above equation (7.12), 
E Gama Ce < t) >E (ee eer x o0) ) = 1. Therefore 


E (PRI, < o0) ) =e“. But eP IT, = oo) = 0, therefore by adding 


. . = 2 
this term, we can write E (e Tpu y2 


P(T, < œ) = limujo y(u) = 1, and T, is finite. Hence (7.11) is proved. The 
distribution of T, corresponding to the transform (7.11) is given in Theorem 3.18. 


) =e “>. It follows in particular that 


The following result is in some sense the converse to the Optional Stopping 
Theorem. 


Theorem 7.17 Let X(t), t > 0, be such that for any bounded stopping time 
T, X(T) is integrable and EX(r) = EX(0). Then X(t), t > 0 is a martingale. 


ProoF: The proof consists of checking the martingale property by using 
appropriate stopping times. Since a deterministic time t is a stopping time, 
X(t) is integrable. Without loss of generality take X(0) = 0. Next we show 
that for t > s, E(X(t)|F,) = X(s). In other words, we need to show that for 
any s < t and any set B E€ Fs 


E(X(I(B)) = E(X(s)1(B)). (7.13) 


Fix a set B € F, and for any t > s, define a stopping time 7 = sI (B) +tI (B°). 
We have E(X(r)) = E(X (s)I(B)) +E(X(t)I(B°)). Since EX (7) = 0, 
B(X(s)1(B)) = EX (7) — E(X(#)1(B°)) = -E(X()I(B°)). 

As the right hand side of the above equality does not depend on s, it follows 
that (7.13) holds. 


The following result is sometimes known as the Optional Sampling Theo- 
rem (see for example, Rogers and Williams (1990)). 


Theorem 7.18 (Optional Sampling) Let M(t) be a uniformly integrable 
martingale, and Tı < T2 < co two stopping times. Then 


E(M(T2)| Fn) = M(T1), a.s. (7.14) 


Optional Stopping of Discrete Time Martingales 


We consider next the case of discrete time t = 0,1,2..., and martingales 
arising in a Random Walk. 
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Gambler’s Ruin 


Consider a game played by two people by betting on the outcomes of tosses 
of a coin. You win $1 if Heads come up and lose $1 if Tails come up. The 
game stops when one party has no money left. You start with x, and your 
opponent with b dollars. Then Sn, the amount of money you have at time n 
is a Random Walk (see Section 3.12). The Gambler’s ruin problem is to find 
the probabilities of ruin of the players. 

In this game the loss of one person is the gain of the other (a zero sum 
game). Assuming that the game will end in a finite time 7 (this fact will be 
shown later), it follows that the ruin probabilities of the players add up to one. 

Consider first the case of the fair coin. Then 


S,=2+5°& P&=1N=5, P@=-1)=5, 
4=1. 


is a martingale (see Theorem 3.33). Let 7 be the time when the game stops, 
the first time the amount of money you have is equal to 0 (your ruin) or x +b 
(your opponent’s ruin). Then 7 is a stopping time. Denote by u the probability 
of your ruin. It is the probability of you losing your initial capital x before 
winning b dollars. Thus 


P(S, =0) =u and P(S, =x +b)=1-—u. (7.15) 
Formally applying the Optional Stopping Theorem 
E(S,) = So = x. (7.16) 


But 
E(S,) = (x +b) x (1 — u) +0 x u = (x + bju. 


These equations give 


b 
= —. 7.17 
Octo CA 
So that the ruin probabilities are given by a simple calculation using martingale 


stopping. 
We now justify the steps. S» is a martingale, and 7 is a stopping time. 
By Theorem 7.14 the stopped process Sna- is a martingale. It is non-negative 
and bounded by x + b, by the definition of r. Thus Snar is a uniformly 
integrable martingale. Hence it converges almost surely to a finite limit Y, 
(with EY = 2), limpoo Snar = Y. By Theorem 3.33 S? — n is a martingale, 

and so is $2,,—nAr. Thus for all n by taking expectation 
B(S? 


NAT 


) =E(nA 7) + E( SẸ). (7.18) 
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By dominated convergence, the lhs has a finite limit, therefore there is a finite 
limit limp... E(n A T). Expanding this, E(n A rT) > nP(T > n), we can see 
that for a limit to exist it must be 


Jim P(r >n) =0, (7.19) 
so that P(T < oo) = 1, and 7 is finite. (Note that a standard proof of finiteness 
of 7 is done by using Markov Chain Theory, the property of recurrence of 
states in a Random Walk.) Writing E(S,,,) = x and taking limits as n — oo, 
the equation (7.16) is obtained (alternatively, the conditions of the Optional 
Stopping Theorem hold). This concludes a rigorous derivation of the ruin 
probability in an unbiased Random Walk. 

We now consider the case when the Random Walk is biased, p Æ q. 


Sn=at > &, P(&=1)=p, P& =-l)=q=1-p. 
w=1 


In this case the exponential martingale of the Random Walk Mn = (q/p)°" 
is used (see Theorem 3.33). Stopping this martingale, we obtain the ruin 


probability p 
_ (a/p) — (a/p)* 
u= e (7.20) 


Justification of the equation E(M,) = Mo is similar to the previous case. 


Hitting Times in Random Walks 


Let Sn denote a Random Walk on the integers started at So = x, Sn = 
Sot 1 & P(& = 1)= p, P(& = -—1) = q= 1 — p, with arbitrary p, and 
T, the first hitting time of b, T, = inf{n : Sn = b} (infimum of an empty set 
is infinity). Without loss of generality take the starting state x = 0, otherwise 
consider the process Sp — x. Consider hitting the level b > 0, for b < 0 consider 
the process — Sn. 

We find the Laplace transform of T,, Y(A) = E(e~*7"), A > 0, by stopping 
the exponential martingale of the Random Walk M, = e@S»~""(™, where 
h(u) = InE(e“*:), and u is arbitrary (see Section 3.12). 


E(Maan) SE E Si (7.21) 
Take u, so that h(u) = à > 0. Write the expectation in (7.21) as 


E (a raam nao i S RAO < n)) +8 R > n)) 
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The first term equals to E (e“?-7"(™ I(T, < n)). The second term converges 
to zero, because by definition of Tp, 


E Coma (6s > n)) <E (PFU > n)) < eub—nh(u) _, 0, 
Now taking limits in (7.21), using dominated convergence we obtain 
E (aromin < o0)) ie. 
Note that e~ (7 I(T, = 00) = 0, therefore by adding this term, we can write 
E (exo) = e7, (7.22) 


This is practically the Laplace oe of T, it remains to replace h(u) by 
A, by taking u = hY (A), with h being the inverse of h. Thus the Laplace 
transform of Tp is given by 


P(A) = E (eR) = eH PPM, (7.23) 


To find h‘-))()), solve h(u) = A, which is equivalent to E(e“§) = eò, or 
pe“+(1—p)e~“ = eò. There are two values for e” = (e*+,/e2> — 4p(1 — p))/(2p), 
but only one corresponds to a Laplace transform, (7.22). Thus we have 


b 
(A) =E(eO?) = (ae) (7.24) 


Using a general result on Laplace transform of a random variable, 


2p b 
1+ |1- a) 
It now follows that the hitting time T, of b is finite if and only if p > 1/2. 
For p < 1/2, there is a positive probability that level b is never reached, 
P(T, = 00) = 1 — (7%). 

When the hitting time of level b is finite, it may or may not have a finite 
expectation. If p > 1/2 we have 


P(T, < 00) = lim%(A) = ( (7.25) 


l . 
o mof zom pet /2 
BS Oh { Sipe 
Thus we have shown that when p > 1/2 any positive state will be reached 
from 0 in a finite time, but when p = 1/2 the average time for it to happen is 
infinite. 
The results obtained above are known as transience (p # 1/2) and recur- 
rence (p = 1/2) of the Random Walk, and are usually obtained by Markov 
Chains Theory. 
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Example 7.9: (Optional stopping of discrete time martingales) 
Let M(t) be a discrete time martingale and T be a stopping time such that E| M(r7)| < 
oo. 

1. If Er < œ and |M(t + 1) — M(t)| < K, then EM(r) = EM(0). 

2. If Er < œ and E(|M(t + 1) — M(t)||F:) < K, then EM(r) = EM(0). 
PROOF: We prove the first statement. 
M(t) = M(0)+>--5 (M(i+ 1) —M(i)) . This together with the bound on increments 
gives 


t-1 
M(t) <|M(0)| + X` |M(i +1) — M(@)| < |M(0)| + Kt. 
i=0 
Take for simplicity non-random M(0) Then 


EM(t)I(r > t) < |M(0)|P(r > t) + KtP(r > t). 


The last term converges to zero, tP(r > t) < E(TI(T > t)) — 0, by dominated 
convergence due to E(t) < oo. Thus condition (7.8) holds, and the result follows. 
The proof of the second statement is similar and is left as an exercise. 


7.5 Localization and Local Martingales 


As it was seen earlier in Chapter 4, Itô integrals Hh X(s)dB(s) are martingales 


under the additional condition f X?(s)ds < oo. In general, stochastic inte- 
grals with respect to martingales are only local martingales rather than true 
martingales. This is the main reason for introducing local martingales. We 
have also seen that for the calculation of expectations stopping and truncations 
are often used. These ideas give rise to the following 


Definition 7.19 A property of a stochastic process X (t) is said to hold locally 
if there exists a sequence of stopping times Tn, called the localizing sequence, 
such that T, Î co as n — œ and for each n the stopped processes X(t ^ Tn) 
has this property. 


For example, the uniform integrability property holds locally for any martin- 

gale. By Theorem 7.13 a martingale convergent in L! is uniformly integrable. 

Here M(tAn) = M(n) for t > n, and therefore Tmn, = n is a localizing sequence. 
Local martingales are defined by localizing the martingale property. 


Definition 7.20 An adapted process M(t) is called a local martingale if there 
exists a sequence of stopping times Tn, such that Tn T co and for each n the 
stopped processes M (t^ Tn) is a uniformly integrable martingale in t. 


As we have just seen, any martingale is a local martingale. Examples of local 
martingales which are not martingales are given below. 
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Example 7.10: M(t) = 1/|B(t)|, where B(t) is the three-dimensional Brownian 
motion, B(0) = x # 0. We have seen in Example 6.17 that if D, is the com- 
plementary set to the ball of radius r centered at the origin, D, = {z : |z| > r}, 
then f(z) = 1/|z| is a harmonic function for the Laplacian on D,. Consequently 
1/|B(t A Tp,.)| is a martingale, where 7p, is the time of exit from D,. Take now 
Tn be the exit time from Dijn, that is, mn = inf{t > 0: |B(t)| = 1/n}. Then 
for any fixed n, 1/|B(t A Tn)| is a martingale. Tn increase to, say, 7 and by conti- 
nuity, B(T) = 0. As Brownian motion in three dimensions never visits the origin 
(see Example 6.14), it follows by continuity that 7 is infinite. Thus M(t) is a local 
martingale. To see that it is not a true martingale, recall that in three dimensions 
Brownian motion is transient and |B(t)| — oo as t — oo. Therefore EM(t) — 0, 
whereas EM (0) = 1/|x|. Since the expectation of a martingale is constant, M (t) is 
not a martingale. 


Example 7.11: (Itô integrals.) 

Let M(t) = ite ce?) qB(s), t > 1/4, where B is Brownian motion in one dimension 
with B(0) = 0. Let Tmn = inf{t > 0: et) = n}. Then for t < Tn, the integrand is 
bounded by n. By the martingale property of Itô integrals, M(t ^ Tn) is a martingale 
in t for any n. By continuity, exp(B?(r)) = 00, thus Tn —> T = œ. Therefore M(t) 
is a local martingale. To see that it is not a martingale notice that for t > 1/4, 
E(e2") = œ, implying that M(t) is not integrable. 


Remark 7.2: Note that it is not enough for a local martingale to be inte- 
grable in order to be a true martingale. For example, positive local martingales 
are integrable, but in general they are not martingales, but only supermartin- 
gales (see Theorem 7.23) below. Even uniformly integrable local martingales 
may not be martingales. However, if a local martingale is dominated by an 
integrable random variable then it is a martingale. 


Theorem 7.21 Let M(t), 0 <t < œ, be a local martingale such that 
|M(t)| << Y, with EY < co. Then M is a uniformly integrable martingale. 


PROOF: Let 7, be a localizing sequence. Then for any n and s < t 
E(M(t A Tn)| Fs) = M(s A Tn). (7.26) 


M is clearly integrable, since E|M(t)| < EY < oo. Since lim, _.., M(t A Tn) = 
M(t), by dominated convergence of conditional expectations lim,_... E(M (t^ 
™m)|Fs) = E(M(t)|Fs). Since limnp+.M(s A Tn) = M(s), the martingale 
property is established by taking limits in (7.26). If a martingale is dominated 
by an integrable random variable then it is uniformly integrable (see Theorem 
7.6). 


Corollary 7.22 Let M(t), 0 <t< œ, be a local martingale such that for all 
t, E(sup,<; |M(s)|) < 00. Then it is a martingale, and as such it is uniformly 
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integrable on any finite interval [0,T]. If in addition E(sup;>o |M(t)|) < œ, 
then M(t), t > 0, ts uniformly integrable on [0, 00). 


In financial applications we meet positive local martingales. 


Theorem 7.23 A non-negative local martingale M(t), O< t< T, is a super- 
martingale, that is, EM (t) < œ, and for any s < t, E(M(t)|Fs) < M(s). 


PROOF: Let mn be a localizing sequence Then since M(tAT,) > 0, by Fatou’s 


lemma 
E(liminf M(t A tm)) < liminf E(M(tA 7)). (7.27) 


Since the limit exists, the lower limit is the same, that is, limp. M(tAT) = 
M(t) implies liminfy.. M(t ^ tm) = M(t). But EM (t A Tn) =EM(0A tT) = 
EM(0) by the martingale property of M(t A Tn). Therefore by taking limits, 
EM(t) < EM(0), so that M is integrable. The supermartingale property is 
established similarly. Using Fatou’s lemma for conditional expectations, 


E(liminf M(t A tm)|Fs) < liminf E(M(tA ™m)|Fs) = M(sAtm), — (7.28) 


Taking limits as n —> oo we obtain E(M(t)|Fs) < M(s) almost surely. 


From this result and Theorem 7.3 we obtain 


Theorem 7.24 A non-negative local martingale M(t), 0 < t < T, is a mar- 
tingale if and only if EM(T) = M(0). 


For a general local martingale a necessary and sufficient condition to be 
a uniformly integrable martingale is described in terms of the property of 
Dirichlet class (D). This class of processes also arises in other areas of calculus 
and is given in the next section. 


Dirichlet Class (D) 


Definition 7.25 A process X is of Dirichlet class, (D), if the family 
{X(r):7 a finite stopping time} is uniformly integrable. 


Any uniformly integrable martingale M is of class (D). Indeed, by Theorem 
7.13, M is closed by Y = M(oo), M(r) = E(Y|F,), and the last family is 
uniformly integrable. 

Using localization one can show the other direction, and we have a theorem 


Theorem 7.26 A local martingale M is a uniformly integrable martingale if 
and only if it is of class (D). 
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PROOF: Suppose that M is a local martingale of class (D). 
Let Tn be a localizing sequence, so that M(t A Tn) is a uniformly integrable 
martingale in t. Then for s < t, 


M(s A Tn) = E(M(t A Tn)| Fs). (7.29) 


The martingale property of M is obtained by taking n — oo in both sides of 
the equation. 

Since Tn > œ, M (s AT) > M(s) almost surely. s AT, is a finite stopping 
time, and because M is in (D), the sequence of random variables {M (s ^Tn)}n 
is uniformly integrable. Thus M(s A Tn) > M(s) also in L}, that is, 


E|M(s A Tm) — M(s)| > 0. (7.30) 
Using the properties of conditional expectation, 


E|E(M(t A tm)|Fs) —E(M(t)|Fs)| = E|E(M(tA mn) — M(t) |Fs)| 
(E|M(t A t) — M(t)| |Fs) 
E|M(t Atm) — M(t)|. 

The latter converges to zero by (7.30). This implies E(M(t A T,)|Fs) > 
E(M(t)|Fs) as n — oo. Taking limits in (7.29) as n — oo establishes the 
martingale property of M. Since it is in (D), by taking 7 = t, it is uniformly 
integrable. 


IA 
es) 


II 


7.6 Quadratic Variation of Martingales 


Quadratic variation of a process X(t) is defined as a limit in probability 
[X, X](¢) = lim X (X (6) - XL)’, (7.31) 
i=1 


where the limit is taken over partitions: 
O=t6 <t] <...<tn =t, 

with ôn = maxo<i<n(t? — t?_,) — 0. If M(t) is a martingale, then M?(t) is 
is a submartingale, and its mean increases (unless M(t) is a constant). By 
compensating M?(t) by some increasing process, it is possible to make it into 
a martingale. The process which compensates M?(t) to a martingale turns out 
to be the quadratic variation process of M. It can be shown that quadratic 
variation of martingales exists and is characterized by the above property. 
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Theorem 7.27 


1. Let M(t) be a martingale with finite second moments, E(M?(t)) < œ for 
all t. Then its quadratic variation process [M,M|(t) defined in (7.31) 
exists, moreover M?(t) — [M, M](t) is a martingale. 


2. If M is a local martingale, then [M, M](t) exists, moreover 
M?(t) — [M, M|(t) is a local martingale. 


PROOF: We outline the proof of the first statement only. The second follows 
for locally square integrable martingales by localization. For local martin- 
gales the result follows from representation of quadratic variation by means 
of stochastic integrals. For a full proof see for example, Liptser and Shiryaev 
(1989), p.56-59. 


E(M(t)M(s)) = EE(M(t)M(s)|Fs) = E(M(s)E(M(t)|Fs)) = nr 
Using this it is easy to obtain . 


E(M(t) — M(s))* = E(M?(t)) — E(M? (s)). (7.33) 


It is easy to see that the sums in the definition of quadratic variation [M, M](t) 
have constant mean, that of EM?(t). It is possible, but is not easy, to prove 
that these sums converge in probability to the limit [M, M](t). Now using 
property (7.33), we can write 


E(M?(t) — M?(s)|Fs) = E((M(t) — M(s))?|Fs) 
= E() (M(ti+1)- M(t)’ |F), (7-34) 
i=0 


where {t;} is a partition of [s, t]. Taking the limit as the size of the partition 
goes to zero, we obtain 


E(M?(t) — M?(s)|F,) = E([M, M](t) — [M, M] (s)| Fs). (7.35) 


Rearranging, we obtain the martingale property of M?(t) — [M, M](t). 


For the next result note that if M is a martingale, then for any t 
E(M(t) — M(0))’ = E(M*(t)) — E(M?(0)), 
which shows that E(M?(t)) > E(M?(0)) unless M(t) = M(0) a.s. Thus M? 


can not be a martingale on [0,¢], unless M(t) = M(0). If M(t) = M(0), then 
for all s < t, M(s) = E(M(t)|F,) = M(0), and M is a constant on [0, t]. 
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Theorem 7.28 Let M be a martingale with M(0) = 0. If for some t, M(t) 
is not identically zero, then |M, M](t) > 0. Conversely, if |M, M](t) = 0, then 
M(s)=0 as. for alls <t. The result also holds for local martingales. 


PROOF: We prove the result for square integrable martingales; for local 
martingales it can be shown by localization. Suppose that [M,M](t) = 0 
for some t > 0. Then, since [M, M] is non-decreasing, [M, M](s) = 0 for 
all s < t. By Theorem 7.27 M?(s), s < t, is a martingale. In particular, 
E(M?(t)) = 0. This implies that M(t) = 0 a.s., which is a contradiction. 
Therefore |M, M](t) > 0. 

Conversely, if [M, M](t) = 0, the same argument shows that M(t) = 0 a.s., 
and by the martingale property M(s) = 0 for all s < t. 


It also follows from the proof that M and [M, M] have same intervals of con- 
stancy. This theorem implies remarkably that a continuous martingale which 
is not a constant has infinite variation on any interval. 


Theorem 7.29 Let M be a continuous local martingale, and fix any t. If M(t) 
is not identically equal to M(0), then M has infinite variation over (0, t]. 


Proor: M(t)—M(0) isa martingale, null at zero, with its value at time t not 
equal identically to zero. By the above theorem M has a positive quadratic 
variation on [0, t], [M, M](t) > 0. By Theorem 1.10 a continuous process of fi- 
nite variation on [0, t] has zero quadratic variation over this interval. Therefore 
M must have infinite variation over [0, t]. 


Corollary 7.30 If a continuous local martingale has finite variation over an 
interval, then it must be a constant over that interval. 


Remark 7.3: Note that there are martingales with finite variation, but by the 
previous result they can not be continuous. An example of such a martingale 
is the Poisson process martingale N(t) — t. 


7.7 Martingale Inequalities 


M(t) denotes a martingale or a local martingale on the interval [0,7] with 
possibly T = co. 


Theorem 7.31 If M(t) is a martingale (or a positive submartingale) then for 
p21 
P(sup |M (s)| > a) < a ? sup E(|M (s)|P). (7.36) 


s<t s<t 
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Ifp>1, then 


E(sup|M(5)P) < (H) EIM). (7.37) 


The case of p = 2 is called Doob’s inequality for martingales. 


E( sup M(s)*) < 4E(M?(T)). (7.38) 
s<T 


As a consequence, if for p > 1, supp<r E(|M(t)|?) < oo, then M(t) is 
uniformly integrable (This is a particular case of Theorem 7.7). 


Theorem 7.32 If M is locally square integrable martingale with M(0) = 0, 
then 


Bue |M(t)| > a) < a-*E((M, M](T)). (7.39) 


Theorem 7.33 (Davis’ Inequality) There are constants c >0 and C < co 
such that for any local martingale M(t), null at zero, 


cE ( IM, M\(T)) <E (sup iano) < cE( [M, M\(T)) . (7.40) 


Theorem 7.34 (Burkholder-Gundy Inequality) There are constants cp 
and C, depending only on p, such that for any local martingale M(t), null at 
zero, 


cp (IM, MY(T)*”?) < E (Cop IMOD) < Cpe (M, MTP), (raD 


forl < p< œ. If moreover, M(t) is continuous, then the result holds also for 
O<p<l. 


The above inequalities hold when T is a stopping time. 

Proofs of these inequalities involve concepts of stochastic calculus for gen- 
eral processes and can be found, for example, in Protter (1992), Rogers and 
Williams (1990), Liptser and Shiryayev (1989). 

We use the above inequalities to give sufficient conditions for a local mar- 
tingale to be a true martingale. 


Theorem 7.35 Let M(t) be a local martingale, null at zero, such that 
E ( [M, M] ©) < œ for allt. Then M(t) is a uniformly integrable martin- 
gale on [0,T], for any finite T. 
If moreover, E[M, M](t) < œ, then M(t) is a martingale with 
EM?(t) = E[M, M] (t) < œ for all t. 
If supyeoo E[M, M](t) < œ, then M(t) is a square integrable martingale. 
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PROOF: By the Davis inequality, sup,< |M (t)| is an integrable random vari- 
able, E (sup,<7 |M(t)|) < CE ( [M, M) T) < œo. Thus M(t) is dominated 
by an integrable random variable on any finite time interval. Therefore it is a 
uniformly martingale by Theorem 7.21, and the first claim is proved. 
The condition E[M, M](t) < co implies the previous condition 
E( IM, MO) < 00, as for X > 0, E(X) > (E(VX))2, due to Var(VX) > 
0. Thus M is a martingale. Alternatively, use the Burkholder-Gundy inequal- 
ity with p = 2. Next, recall that by Theorem 7.27, if M(t) is a martingale 
with E (M?(t)) < œ, then M?(t) — [M, M](t) is a martingale. In particular, 
for any finite t 
E (M*(t)) = E[M, M] (è), 


and the second statement is proved. To prove the third, notice that since 
both sides in the above equation are non-decreasing, they have a limit. Since 
by assumption lim;—>o E[M, M](t) < œ, sup,c,, EM?(t) < œ, and M(t), 
0 <t< oo is a square integrable martingale. 


ae to Itô ee 


Let X(t = JH (s). Being an Itô -o X is a local martingale. 
Its oe Lae is given by [X, X]( =f H?(s)ds. The Burkholder- 
Gundy inequality with p = 2 gives E oR (t = < CE([X,X]|(T)) = 
Ef, H?(s)ds. If E Ce H?(s (s)ds) < œ, then X(t) is a square integrable mar- 


tingale. Thus we recover the known fact that E(X?(t)) =E(fy H?(s (s)ds). 


The Davis inequality gives E(sup;<r {iH (s)dB(s)) < CE ( AG H?(s (sds), 


Thus the condition 
t 
E J H?(s)ds | < co (7.42) 
0 


is a sufficient condition for the It6 integral to be a martingale and, in particular, 
to have zero mean. This condition, however, does not assure second moments. 


7.8 Continuous Martingales. Change of Time 


Brownian motion is the basic continuous martingale from which all continuous 
martingales can be constructed, either by random change of time, given in this 
section, or by stochastic integration, as will be seen in the next chapter. The 
starting point is a result that characterizes a Brownian motion. 
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Levy’s Characterization of Brownian Motion 


Theorem 7.36 (Levy) A process M with M(0) =0 is a Brownian motion if 
and only of it is a continuous local martingale with quadratic variation process 
[M, M](t) =t. 


ProoF: If M is a Brownian motion, then it is a continuous martingale with 
[M, M(t) =t. 

Let M(t) be a continuous local martingale with [M, M](t) = t. Then uM (t) 
is a continuous local martingale with [uM, uM](t) = u?t. We show that 


U(t) = euM (t)—u7t/2 = euM (t)—[uM,uM](t)/2 (7.43) 


is a martingale. Once this is established, the rest of the proof follows by an 
application of the martingale property. 

The general theory of integration with respect to martingales is required 
to show the martingale property of U(t). It is an an easy corollary of a general 
result on stochastic exponential martingales (Theorem 8.17, Corollary 8.18). 
Writing the martingale property, we have 


H(etM)-u't/2) 5, ) = euM(s)—u? 8/2. (7.44) 
from which it follows that 


Since the right hand side of (7.45) is non-random, it follows that M(t) has 
independent increments. Taking expectation in (7.45), we obtain 


neam) = ew (t—s)/2, (7.46) 


which shows that the increment of the martingale M(t) — M(s) has Normal 
distribution with mean zero and variance (t— s). Therefore M is a continuous 
process with independent Gaussian increments, hence it is Brownian motion. 


Example 7.12: Any solution of Tanaka’s SDE in Example 5.15 is a Brownian 
motion (weak uniqueness). 

dX(t) = sign(X(t))dB(t), where sign(x) = 1 if x > 0 and -1 if x < 0. X(0) =0. 
X(t) = J sign(X(s))dB(s). Since it is an Itô integral, it is a local martingale (even 
a martingale, as the condition for it to be a martingale holds). It is continuous, and 
its quadratic variation is given by [X, X](t) = Hie sign? (X(s))ds = t. Therefore it is 
a Brownian motion. 
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Change of Time for Martingales 


The main result below states that a continuous martingale M is a Brown- 
ian motion with a change of time, where time is measured by the quadratic 
variation [M,M](t), namely, there is a Brownian motion B(t), such that 
M(t) = B([M, M](t)). This B(t) is constructed from M(t). Define 


7 =inf{s:[M, M](s) > t}. (7.47) 
If [M, M](t) is strictly increasing, then 7; is its inverse. 


Theorem 7.37 (Dambis, Dubins-Schwarz) Let M(t) be a continuous mar- 
tingale, null at zero, such that |M, M](t) is non-decreasing to co, and T; defined 
by (7.47). Then the process B(t) = M(t) is a Brownian motion with respect 
to the filtration F,,. Moreover, |M, M](t) is a stopping time with respect to this 
filtration, and the martingale M can be obtained from the Brownian motion B 
by the change of time M(t) = B([M, M](t)). The result also holds when M is 
a continuous local martingale. 


We outline the idea of the proof, for details see, for example, Rogers and 
Williams (1990), p.64, Karatzas and Shreve (1988) p.174, Protter (1992) p.81, 
Revuz and Yor (1998) p. 181. 

PROOF: Let M(t) be a local martingale. 7; defined by (7.47) are finite 
stopping times, since [M, M](t) — oo. Thus F, are well defined, (see Chap- 
ter 2 for the definition of F,). Note that {[M,M](s) < t} = {nm > s}. 
This implies that [M,M](s) are stopping times for Fn. Since [M,M](s) is 
continuous |M, M](m) = t. Let X(t) = M(%). Then it is a continuous lo- 
cal martingale, since M and [M, M] have the same intervals of constancy 
(see the comment following Theorem 7.28). Using Theorem 7.27 we obtain 
EX?(t) = ELX,X](t) = E[M,M](7) = t. Thus X is a Brownian motion 
by Levy’s characterization Theorem 7.36. The second part is proven as fol- 
lows. Recall that M and [M, M] have the same intervals of constancy. Thus 
X((M,M]() = M (ruma) = MQ). 


Example 7.13: Let M(t) = J f(s)dB(s), with f continuous and non-random. 
Then M is a Gaussian martingale. Its quadratic variation is given by [M, M] (t) = 
t ; 
So f’(s)ds. For example, with f(s) = s, M(t) = J sdB(s) and [M, M](t) 
J s?°ds = t?/3. In this example [M, M](t) is non-random and increasing. T+ is 
3 
given by its inverse, 7 = (3t)!/3. Let X(t) = M(nm) = fie sdB(s). Then, clearly, 
X is continuous, as a composition of continuous functions. It is also a martingale 


with quadratic variation TP /3 =t. Hence, by the Levy’s theorem, it is a Brownian 
motion, X(t) = B(t). By the above theorem, M(t) = B(t?/3). 
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Example 7.14: If M(t =e H(s is an Ito integral, then it is a local mar- 
m with quadratic variation - An af H?(s)ds. If K H?(s)ds = oo, then 
B( J H?( )ds), where B(t (t) is Brownian motion and can be recovered from 


i o with the ee change of time. 


Example 7.15: (Brownian Bridge as Time Changed Brownian motion) 
The SDE for Brownian Bridge (5.34) contains as its only stochastic term Jor 7KdB(s). 
Since for any t < T, it is a continuous martingale with quadratic variation IY, Y(t = 


J may ds = TOT it follows by the DDS Theorem 


t 


Y(t) = BaT) 


for some Brownian motion B. Therefore SDE (5.34) has the following representation 


X(t) =a(1- $) +b5+ (T — t)B( for0<t<T. (7.48) 


t 
T TTD) 


In this representation t = T is allowed and understood by continuity, since the limit 
of tB(1/t) as t > 0 is zero by the Law of Large Numbers for Brownian motion. 


Change of Time in SDEs 


We use the Change of Time (DDS) Theorem for constructing weak solutions 


of some SDEs. Let ; 
t) =| V fi (t)dB(O), (7.49) 


where f(t) is an adapted, positive, increasing, differentiable process, null at 
zero. It is a local martingale with quadratic variation 

[X,X](t) = fo f’(s)ds = f(t). Thus n = fO} (t), the inverse of f, and 
according to the Change of Time Theorem, the process X(f‘~))(t)) = B(t) is 
a Brownian motion (with respect to F,,), and 


x(t) = BY). (7.50) 
Thus from equations (7.49) and (7.50) we have 
Theorem 7.38 Let f(t) be an adapted, positive, increasing, differentiable pro- 
cess, and 
t) = /f'(()dB(e). (7.51) 
Then the process B(f(t)) is a weak solution. 


We can write equation (7.51) as follows: for a Brownian motion B, and a 
function f, there is a Brownian motion B, such that 


) = JF (HdB(t). (7.52) 
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In the case of non-random change of time in Brownian motion B(f(t)), it 
is easy to check directly that M(t) = B(f(t)) is a martingale (with respect 
to the filtration Fpa). The quadratic variation of B(f(t)) is [M,M](t) = 
[B(f), B(f)|() = f(t), was calculated directly in the Example 4.24, equation 
(4.63). 


Example 7.16: (Ornstein-Uhlenbeck process as Time Changed Brownian motion) 
With f(t) = 0?(e?** — 1)/(2a)), the process B(a?(e?™ — 1)/(2a) is a weak solution 
to the SDE 

dX(t) = ce“dB(t). 


Consider U(t) = e~°' X(t). Integrating by parts, U (t) satisfies 
dU(t) = —aU(t)dt + odB(t). (7.53) 


Recall that the solution to this SDE is given by (5.13). Thus U(t) is an Ornstein- 
Uhlenbeck process (see Example 5.6). Thus an Ornstein-Uhlenbeck process has rep- 


resentation 

U(t) = e “ B(a?(e?* — 1)/(2a)). (7.54) 
To have U(0) = x, take B(t) to be a Brownian motion started at x. Note that in 
equations (7.53) and (7.54), B(t) denotes different Brownian motions. 


Next we construct a weak solution to the SDEs of the form 


dX(t) = o(X(t))dB(t) 
with o(x) > 0 such that 


i ds 
oy) =f BO) 


is finite for finite t, and increases to infinity, i 


FBG = co almost surely. 


Then G(t) is adapted, continuous and strictly increasing to G(oo) = co. There- 


fore it has inverse 
n= GOD (t). (7.55) 


Note that for each fixed t, 7 is a stopping time, as it is the first time the 
process G(s) hits t, and that 7 is increasing. 


Theorem 7.39 The process X(t) = B(7) is a weak solution to the SDE 
dX(t) = o(X(t))dB(t). (7.56) 


PROOF: X(t) = B(%) = B(G} (t)). Using equation (7.52) with f = GY, 
we obtain 
dB(GY (t)) = y (GY) (dB). 
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—1) Vv 1 1 2 
(G! )) (t) = EEN = IBG DH) = 0°(B(7)). (7.57) 


Thus we obtain 


and the result is proved. 


Another proof, by considering the martingale problem, is given next. Note 
that (7.57) gives dr, = 0?(B(7;))dt. 

PROOF: The diffusion operator for the SDE (7.56) is given by 

Lf (x) = $07(x)f"(x). We show that X(t) = B(7;) is a solution to the mar- 
tingale problem for L. Indeed, we know (see Example 5.17) that for any 
twice continuously eee oe J so outside a compact in- 
terval, the process M(t) = -=f Łf”(B(s))ds is a martingale. Since 
T are increasing stopping . can be ee fe using Optional Stop- 
ping Theorem 7.18 ) that the process M (r) is also a martingale, that is, 
f(B(m)) - fe +f” (B(s))ds is a martingale. Now perform the change of vari- 
able s = Tu, and observe from (7.57) that dr, = 07(B(%))dt, to obtain that 


the process f(B(7)) — to 2(B(tu)) f” (B o is a o But since 


X(t) = B(T), this being the same as f(X )- h $o 1o?(X(u)) f" (X (u))du is a 
martingale, and X(t) solves the ee ae AS a 


An application of Theorem 7.37 gives a result on uniqueness of the solution 
of SDE (7.56). This result is weaker than Theorem 6.13 of Engelbert-Schmidt. 


Theorem 7.40 Let a(x) be a positive function bounded away from zero, a(x) > 
ô > 0. Then the stochastic differential equation (7.56) has a unique weak so- 
lution. 


PROOF: Let X(t) be a weak solution to (7.56). Then X(t) is a local mar- 
tingale and there is a Brownian motion ((t), such that X(t) = 6([X, X](t)). 
Now, 


Exo = f o*(X(s))as= f o’ (B(X, X](s)))ds. 


Thus [X, X](t) is a solution to the ordinary differential equation (ODE) 
da(t) = o7(G(a(t))dt. Since the solution to this ODE is unique, the solution 
o (7.56) is unique. 


A more general change of time is done for the stochastic differential equa- 
tion 
dX (t) = pee (t E : (7.58) 
Let g(x) be a positive function for which G(t = figl ))ds is finite for finite 
t and increases to infinity almost surely. a m= ee i V(t). 
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Theorem 7.41 Let X(t) be a solution to (7.58) and define Y(t) = X(t). 
Then Y (t) is a weak solution to the stochastic differential equation 


YO  , YO) ps 
dart) = py tt ary ot th Y(0) = X(0). (7.59) 


One can use the change of time on an interval [0, T], for a stopping time T. 


Example 7.17: (Lamperti’s Change of Time) 
Let X(t) satisfy the SDE (Feller’s branching diffusion) 


dX(t) = wX(t)dt +o X(t)dB(t), X(0)=2>0, (7.60) 
with positive constants u and ø. Lamperti’s change of time is G(t) = Ji X(s)ds 
Here g(x) = x. Then Y(t) = X(t) satisfies the SDE 

VY (t 
mos Bee EVO aE), 
Y(t) Y(t) 
= ypdt+odB(t) with Y(0) =a, 
and 
Y(t) =a+pt+oB(t). (7.61) 


In other words, with a random change of time, the Branching diffusion is a Brownian 
motion with drift. At the (random) point where G(t) stops increasing its inverse T+, 
defined as the right-inverse mų = inf{s : G(s) = t}, also remains the same. This 
happens at the point of time when X(t) = 0. It can be seen that once the process is 
at zero, it stays at 0 forever. Let T = inf{t : X(t) = 0}. T is a stopping time, and 
Y(t) is the Brownian motion stopped at that time. 

The other direction is also true, a Branching diffusion can be obtained from a 
Brownian motion with drift. Let Y (t) satisfy (7.61), and and let T = inf{t : Y (t) = 
0}. Y(t) > 0 fort < T. Define 


and let 7 be the inverse of G, which is well defined on [0, T). Then X(t) = Y (rẹ) 
satisfies the SDE (7.60) stopped when it hits zero. 


Remark 7.4: Any solution to an SDE with time independent coefficients can 
be obtained from Brownian motion by using change of variables and random 
time change (Gihman and Skorohod (1972), p.113). 

There are three main methods used for solving SDEs: change of state 
space, that is, change of variable (Itô’s formula), change of time and change of 
measure. We have seen examples of SDEs solved by using change of variables, 
and change of time. The change of measure approach will be covered later. 


Notes. Material for this chapter is based on Protter (1992), Rogers and 
Williams (1990), Gihman and Skorohod (1972), Liptser and Shiryayev (1977), 
(1989), Revuz and Yor (1998). 
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7.9 Exercises 


Exercise 7.1: Let M(t) be an F;-martingale and denote its natural filtration 
by G+. Show that M(t) is a G,-martingale. 


Exercise 7.2: Show that an increasing integrable process is a submartingale. 


Exercise 7.3: Show that if X (t) is a submartingale and g is a non-decreasing 
convex function such that El|g(X(t))| < œ, then g(X(t)) is a submartingale. 


Exercise 7.4: Show that M(t) is a square integrable martingale if and only 
if M(t) = E(Y|F;), where Y is square integrable, E(Y?) < oo. 


Exercise 7.5: (Expected exit time of Brownian motion from (a, b).) 

Let B(t) be a Brownian motion started at x € (a,b), and 7 = inf{t: B(t) = 
a or b}. By stopping the martingale M(t) = B(t)? — t, show that E,(r) = 
(a —a)(b— zx). 


Exercise 7.6: Find the probability of B(t) — t/2 reaching a before it reaches 


b when started at x, a < x < b. Hint: use the exponential martingale M(t) = 
B(t)—t/2 
e ; 


Exercise 7.7: Find the expected length of the game in Gambler’s ruin, when 
e betting is done on a fair coin 
e betting is done on a biased coin 


Exercise 7.8: Give the probability of ruin when playing a game of chance 
against an infinitely rich opponent (with initial capital b — oo). 


Exercise 7.9: (Ruin Probability in Insurance) A Discrete Time Risk Model 
for the surplus U, of an insurance company at the end of year n, n = 1,2,... is 
given by Un = Up + cen — See Xk, where c is the total annual premium, X; is 
the total (aggregate) claim in year k. The time of ruin T is the first time when 
the surplus becomes negative, T = min{n: Un < 0}, with T = œ if U, > 0 
for all n. Assume that {X,, k = 1,2,---} are iid. random variables, and 
there exists a constant R > 0 such that E (e~*(¢-*2)) = 1. Show that for all 
n, P,(L < n) < e~®*, where Uo = « the initial funds, and the ruin probability 
P,(T < œ) < e~®*, Hint: show that Mp = e~”4> is a martingale, and use 
the Optional Stopping Theorem. 


Exercise 7.10: (Ruin Probability in Insurance continued) Find the bound 
on the ruin probability when the aggregate claims have N (u, o?) distribution. 
Give the initial amount x required to keep the ruin probability below level a. 
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Exercise 7.11: Let B(t) be a Brownian motion starting at zero and T be the 
first exit time from (—1, 1), that is, the first time when |B| takes value 1. Use 
Davis’ inequality to show that E(VT) < oo. 


Exercise 7.12: Let B(t) be a Brownian motion, X(t) = IM sign(B(s))dB(s). 
Show that X is also a Brownian motion. 


Exercise 7.13: Let M(t =f e*dB(s). Find g(t) such that M(g(t)) is a 
Brownian motion. 


Exercise 7.14: Let B(t) be a Brownian motion. Give an SDE for e~™ B(e?). 
Exercise 7.15: Prove the change of time result in SDEs, Theorem 7.41. 


Exercise 7.16: Let X(t) satisfy SDE dX(t) = p(t)dt + o(t)dB(t) on [0,T]. 
Show that X(t) is a local martingale if and only if u(t) = 0 a.e. 


Exercise 7.17: f(x,t) is differentiable in t and twice in x. It is known that 
X(t) = f(B(#), t) is of finite variation. Show that f is a function of t alone. 


Exercise 7. ae Let Y(t =f B (s) and W(t zaf sign(B(s))dB(s). 


Show that dY (t a/t + z T po uniqueness of the weak solution 
of the above g 


Chapter 8 


Calculus For 
Semimartingales 


In this chapter rules of calculus are given for the most general processes for 
which stochastic calculus is developed, called semimartingales. A semimartin- 
gale is process consisting of a sum of a local martingale and a finite variation 
process. Integration with respect to semimartingales involves integration with 
respect to local martingales, and these integrals generalize the It6 integral 
where integration is done with respect to a Brownian motion. Important con- 
cepts, such as compensators and the sharp bracket processes are introduced, 
and It6’s formula in its general form is given. 


8.1 Semimartingales 


In stochastic calculus only regular processes are considered. These are either 
continuous processes, or right-continuous with left limits, or left-continuous 
with right limits. The regularity of the process implies that it can have at 
most countably many discontinuities, and all of them are jumps (Chapter 1). 
The definition of a semimartingale presumes a given filtration and processes 
which we consider are adapted to it. Following the classical approach, see for 
example, Metivier (1982), Liptser and Shiryayev (1989) p.85, a semimartingale, 
is a local martingale plus a process of finite variation. More precisely, 


Definition 8.1 A regular right-continuous with left limits (cadlag) adapted 
process is a semimartingale if it can be represented as a sum of two processes: 
a local martingale M(t) and a process of finite variation A(t), with M(0) = 
A(0) = 0, and 

S(t) = S(0) + M(t) + A(t). (8.1) 
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Example 8.1: (Semimartingales) 


1. S(t) = B?(t), where B(t) is a Brownian motion is a semimartingale. S(t) = 
M(t) +t, where M(t) = B?(t) — t is a martingale and A(t) = t is a finite 
variation process. 


2. S(t) = N(t), where N(t) is a Poisson process with rate À, is a semimartingale, 
as it is a finite variation process. 


3. One way to obtain semimartingales from known semimartingales is by ap- 
plying a twice continuously differentiable (C°) transformation. If S(t) is a 
semimartingale and f is a C° function, then f(S(t)) is also a semimartingale. 
The decomposition of f(S(t)) into martingale part and finite variation part is 
given by It6’s formula, given later. In this way we can assert that, for example, 
the geometric Brownian motion e7?+#* is a semimartingale. 


4. A right-continuous with left limits (cadlag) deterministic function f(t) is a 
semimartingale if and only if it is of finite variation. Thus f(t) = tsin(1/t), 
t € (0, 1], f(0) = 0 is continuous, but not a semimartingale (see Example 1.7). 


A diffusion, that is, a solution to a stochastic differential equation with respect 
to Brownian motion, is a semimartingale. Indeed, the It6 integral with respect 
to dB(t) is a local martingale and the integral with respect to dt is a process 
of finite variation. 


ol 


6. Although the class of semimartingales is rather large, there are processes which 
are not semimartingales. Examples are: |B(t)|*, 0 < a < 1, where B(t) is the 
one-dimensional Brownian motion; J (t— s) “dB(s), 0 < a < 1/2. It requires 
analysis to show that the above processes are not semimartingales. 


For a semimartingale X, the process of jumps AX is defined by 
AX(t)= X(t) — X(t-), (8.2) 


and represents the jump at point t. If X is continuous, then of course, AX = 0. 


8.2 Predictable Processes 


In this section we describe the class of predictable processes. This class of 
processes has a central role in the theory. In particular, only predictable 
processes can be integrated with respect to a semimartingale. Recall that in 
discrete time a process H is predictable if H,, is Fn—ı measurable, that is, H 
is known with certainty at time n on the basis of information up to time n— 1. 
Predictability in continuous time is harder to define. We recall some general 
definitions of processes starting with the class of adapted processes. 


Definition 8.2 A process X is called adapted to filtration F = (F+), if for 
all t, X(t) is Fi-measurable. 
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In construction of the stochastic integral 6 H(u)dS(u), processes H and S are 
taken to be adapted to F. For a general semimartingale S, the requirement 
that H is adapted is too weak, it fails to assure measurability of some basic 
constructions. H must be predictable. The exact definition of predictable 
processes involves o-fields generated on IR* x Q and is given later in Section 
8.13. Note that left-continuous processes are predictable, in the sense that 
H(t) = lim,;; H(s) = H(s—). So that if the values of the process before t are 
known, then the value at t is determined by the limit. For our purposes it is 
enough to describe a subclass of predictable processes which can be defined 
constructively. 


Definition 8.3 H is predictable if it is one of the following: 


a) a left-continuous adapted process, in particular, a continuous adapted 
process. 


b) a limit (almost sure, in probability) of left-continuous adapted processes. 


c) a regular right-continuous process such that, for any stopping time T, H» 
is F,_-measurable, the o-field generated by the sets AN {T < t}, where 
AEF: 


d) a Borel-measurable function of a predictable process. 


Example 8.2: Poisson process N (t) is right-continuous and is obviously adapted to 
its natural filtration. It can be shown, see Example 8.31, that it is not predictable. 
Its left-continuous modification N(t—) = lims7: N(s) is predictable, because it is 
adapted and left-continuous by a). Any measurable function (even right-continuous) 
of N(t—) is also predictable by d). 


Example 8.3: Right-continuous adapted processes may not be predictable, even 
though they can be approached by left-continuous processes, for example, Xe(t) = 
limeo X((t + €)—). 


Example 8.4: Let T be a stopping time. This means that for any t, the set 
{T > t} € Fi. Consider the process X(t) = Ijo,r(t). It is adapted, because its 
values are determined by the set {T < t} (X(t) = 1 if and only if w € {T < t}), 
and {T < t} = {T > t}° € F. X(t) is also left-continuous. Thus it is a predictable 
process by a). We also see that T is a stopping time if and only if the process 
X(t) = Iio,r (t) is adapted. 


Example 8.5: It will be seen later that when filtration is generated by Brownian 
motion, then any right-continuous adapted process is predictable. This is why in the 
definition of the It6 integral right-continuous functions are allowed as integrands. 
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8.3 Doob-Meyer Decomposition 


Recall that a process is a submartingale if for all s < t, E(X(t)|F,) > X(s) 
almost surely 


Theorem 8.4 If X is a submartingale or a local submartingale, then there 
exists a local martingale M(t) and a unique increasing predictable process A(t), 
locally integrable, such that 

X(t) = X(0)+ M(t) + A(t). (8.3) 
If X(t) is a submartingale of Dirichlet class (D) (see Definition 7.25), then 
the process A is integrable, that is, sup, EA(t) < oo, and M(t) is a uniformly 
integrable martingale. 
Example 8.6: 

1. Let X(t) = B?(t) on a finite interval t < T. X(t) is a submartingale. Decom- 
position (8.3) holds with M(t) = B?(t)—t and A(t) = t. Since the interval is 
finite, M is uniformly integrable and A is integrable. 

2. Let X(t) = B?(t) on the infinite interval t > 0. Then (8.3) holds with M(t) = 
B?(t)—t and A(t) = t. Since the interval is infinite M is a martingale, and A 
is locally integrable; for example take the localizing sequence Tn = n. 


3. Let X(t) = N(t) be a Poisson process with intensity A. Then X is a sub- 
martingale. Decomposition (8.3) holds with M(t) = N(t) — At and A(t) = At. 


Proof of Theorem 8.4 can be found in Rogers and Williams (1990), p.372-375, 
Dellacherie (1972), Meyer (1966). 


Doob’s Decomposition 


For processes in discrete time, decomposition (8.3) is due to Doob, and it is 
simple to obtain. Indeed, if Xn is a submartingale, then clearly, 

Xn+1 = Xo + yep keri — X;). By adding and subtracting E(X;41|F;), we 
obtain the Doob decomposition 


Xn+1 = = Xo + Si Xi+ı — Xi+ı |F:)) + Yee Xi+ı |Fi)— Xi) $ (8.4) 


where martingale a an increasing process are given by 

Mn+1 = ba (Xi+ı = E(Xi41|Fi)) and An+1 = 5 (E(Xi41|Fi) — Xj) (8.5) 

i=0 i=0 
A, is increasing due to the submartingale property, E(Xi+1|F:) — X; > 0 for 
all i. It is also predictable, because E(Xn+1|Fn) and all other terms are Fn 
measurable. 
It is much harder to prove decomposition (8.3) in continuous time and this 

was done by Meyer. 
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8.4 Integrals with respect to Semimartingales 


In this section the stochastic integral FS H(t)dS(t) is defined, where S(t) is a 
semimartingale. Due to representation S(t) = S(0)+ M(t) + A(t) the integral 
with respect to S(t) is the sum of two integrals one with respect to a local 
martingale M(t) and the other with respect to a finite variation process A(t). 
The integral with respect to A(t) can be done path by path as the Stieltjes 
integral, since A(t), although random, is of finite variation. 

The integral with respect to the martingale M (t) is new, it is the stochas- 
tic integral S H(t)dM(t). When M(t) is Brownian motion B(t), it is the Itô 
integral, defined in Chapter 4. But now martingales are allowed to have jumps 
and this makes the theory more complicated. The key property used in the 
definition of the It6 integral is that on finite intervals Brownian motion is a 
square integrable martingale. This property in its local form plays an impor- 
tant role in the general case. Conditions for the existence of the integral with 
respect to a martingale involves the martingale’s quadratic variation, which 
was introduced in Section 7.6. 


Stochastic Integral with respect to Martingales 


For a simple predictable process H(t), given by 


n-1 


H(t) = A(0)Io a 5 Hiler, T1); (8.6) 
1=0 


where 0 = Tp < Tı < ... < Tna < T are stopping times and H;’s are Fr,- 
measurable, the stochastic integral is defined as the sum 


Tv n-1 
[BOO = E HMT) - MT). (8.7) 
1=0 


If M(t) is a locally square integrable martingale, then by the L? theory (Hilbert 
space theory) one can extend the stochastic integral from simple predictable 
processes to the class of predictable processes H such that 


T 
Í H?(t)d|M, M]|(t) is locally integrable. (8.8) 
0 


If M(t) is a continuous local martingale, then the stochastic integral is defined 
for a wider class of predictable processes H satisfying 


r H?(t)d[M, M](t) < œ as. (8.9) 
0 
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Properties of Stochastic Integrals with respect to Martingales 


1. Local martingale property. If M(t) is a local martingale, the integral 
Ks H(s)dM(s) is a local martingale. 


2. Isometry property. If M(t) is a square integrable martingale, and H 


satisfies 
T 
e( f H*(s)d[M, M](s)) < œ; (8.10) 
0 

then JH s)dM (s) is a square integrable martingale with zero mean and 

variance 

t 2 t 
e( | aodami) =B( | Ham, M). (8.11) 
0 0 


3. Ifa local martingale M(t) is of finite variation, then the stochastic inte- 
gral is indistinguishable from the Stieltjes integral. 


Example 8.7: Consider It6 integrals with respect to Brownian motion. Since B(t) 
is a square integrable martingale p on - T], with [B, i ) = t, we recover that for 


a predictable H, such that Ef, H s)ds < ov, JH (s)dB(s) is a square integrable 
martingale with zero mean and variance f o EH 2(s)ds. 
Stochastic Integrals with respect to Semimartingales 


Let S be a semimartingale with representation 
S(t) = S(0) + M(t) + A(t), (8.12) 


where M is a local martingale and A is a finite variation process. Let H be a 
predictable process such that conditions (8.13) and (8.8) hold. 


[ Eola <a. (8.13) 
0 


where V4(t) is the variation process of A. Then the stochastic integral is 
defined as the sum of integrals, 


[wo t)dS(t j= [Hw t)dM(t )+ f HO t)dA(t (8.14) 


Since a representation of a semimartingale (8.12) is not unique, one should 
check that the the stochastic integral does not depend on the representation 
used. Indeed, if S(t) = S(0) + My(t) + A1(¢) is another representation, then 
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(M — M,)(t) = -(A — Aı)(t). So that M — M; is a local martingale of 
finite variation. But for such martingales Aaa and a a en 
the same, and it follows that f H(t)dM,(t)+ f H(t)dAi(t) = f H(t 


S H()dA(t) = f H(t)dS(t 

Since the integral with respect to a local martingale is a local martingale, 
and the integral with respect to a finite variation process is a process of finite 
variation, it follows that a stochastic integral with respect to a semimartingale 
is a semimartingale. 

For details see Liptser and Shiryayev (1989), p.90-116. 


Example 8.8: n N(t) be a Poisson process. N(t) is of finite variation and 
the integral [oN ó t)dN(t) is well defined as a Stieltjes integral, [oN is t)dN(t) = 
ee N(ri), a Ti’s are the jumps of N(t). However, N(t)d ee ) is not 
the stochastic integral, since N(t) is not predictable, but je NG —)dN (t) is. It 
is indistinguishable from the integral in the sense of Stieltjes JN (t—)dN(t a 


d,<r N(i-1). 


Properties of Stochastic Integrals with respect to Semimartingales 


Let X be a semimartingale and H a predictable processes, such that the 
stochastic integral exists for 0 < t < T, and denote 


= [ H(s)dX(s) 


Then the stochastic integral H - X has the following properties 


1. The jumps of the integral occur at the points of jumps of X, and 
A(H- X)(t) = H(t)AX(t). In particular, a stochastic integral with 
respect to a continuous semimartingale is continuous. 


2. If T is a stopping time, then the stopped integral is the integral with 
respect to the stopped semimartingale, 


— H(s)dX(s j= f HoN I(s < 7)dX (8) = [ m0 (s)dX(s A 7). 


3. If X is of finite variation, then JH s)dX (s) is indistinguishable from 
the Stieltjes ee computed path ~ path. 


4. Associativity. If Y(t =H (s) isa a and if K is 


a predictable process, a a a Y)( =K (s) is defined, 
then K-Y=K.-(H-X)=(KH).-X, le 


[ Koro = [ Kon X (s). 
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8.5 Quadratic Variation and Covariation 


If X,Y are semimartingales on the common space, then the quadratic co- 
variation process, also known as the the square bracket process and denoted 
[X,Y](t), is defined, as usual, by 


n-1 


[X,Y] (¢) = lim $ (X (t1) — X (EDY (Ha) — YEH), (8.15) 
i=0 


where the limit is taken over shrinking partitions {t?}_, of the interval [0, t] 
when ôn = max;(t?, ; —t}) —> 0 and is in probability. Taking Y = X we obtain 
the quadratic variation process of X. 


Example 8.9: We have seen that quadratic variation of Brownian motion B(t) is 
[B, B\(t) = t and of Poisson process N(t) is [N, N](t) = N(t). 
Properties of Quadratic Variation 


We give the fundamental properties of the quadratic variation process with 
some explanations, but omit the proofs. 


1. If X is a semimartingale, then [X, X] exists and is an adapted process. 


2. It is clear from the definition that quadratic variation over non-overlapping 
intervals is the sum of the quadratic variation over each interval. As 
such, [X, X](t¢) is non-decreasing function of t. Consequently |X, X](t) 
is a function of finite variation. 


3. It follows from the definition (8.15) that [X, Y] is bilinear and symmetric, 
that is, [X,Y] = [Y, X] and 


[aX + Y, BU + V] = ab| X,U] + a[X,V] + B[Y,U]+[Y,V]. (8.16) 
4. Polarization identity. 
[X,Y] = S(IX+¥,X+¥]-[x,x]- [Y,Y1). (8.17) 


This property follows directly from the previous one. 


5. [X,Y](t) is a regular right-continuous (cadlag) function of finite varia- 
tion. This follows from the polarization identity, as [X,Y] is the differ- 
ence of two increasing functions. 


6. The jumps of the quadratic covariation process occur only at points 
where both processes have jumps, 


ALX, Y](t) = AX(#)AY (2). (8.18) 
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7. If one of the processes X or Y is of finite variation, then 


[X,Y](t) = X AX(s)AY(s). (8.19) 


s<t 


Notice that although the summation is taken over all s not exceeding t, 
there are at most countably many terms different from zero. 


The following property is frequently used, it follows directly from (8.19). 


Corollary 8.5 If X(t) is a continuous semimartingale of finite variation, then 
it has zero quadratic covariation with any other semimartingale Y(t). 


Quadratic Variation of Stochastic Integrals 


The quadratic covariation of stochastic integrals has the following property 


[f #6 )dX(s ) f Ks )dY(s @= f HK )d[X,Y](s). (8-20) 


In particular the quadratic variation of a stochastic integral is given by 


[ | He )dX(s DE )dX (s @= f od )d[X, X] ($), (8.21) 


and 


| | H(s)dX(s), Y] (t) 


II 
~ 
X 
a 
os 
= 
oo 
a. 
jas 
2 


(8.22) 


II 

oe 
= 
& — 
Ma 
< 


Quadratic variation has a representation in terms of the stochastic integral. 


Theorem 8.6 Let X be a semimartingale null at zero. Then 


[X, X](@) = X? (t) — 2 | X(s—)dX(s). (8.23) 
0 
For a partition {t?} of [0,t], consider 
valt) = D(X (a) — (ED). 
1=0 


For any fixed t the sequence v,(t) converges in probability to a limit [X, X(t) 
as max;<n(t7,, — t) > 0. Moreover, there is a subsequence nz, such that 
the hee Ung (), ate uniformly on any bounded time interval to the 


process X*(t) — 2 hes X(s). 
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We justify (8.23) heuristically. By opening the brackets in the sum for v,,(¢), 
adding and subtracting X?(t;), we obtain 


woes (xeka) - XE) - XUD (Xea) - X(t)). 
i=0 1=0 


The first sum is X2(t). The second has for its limit in probability 
Ji X(s—)dX(s). This argument can be made into a rigorous proof, see for 
example, Metivier (1982), p.175. Alternatively (8.23) can be established by 
using Itô’s formula. 

Using the polarization identity, it is easy to see 


Corollary 8.7 For semimartingales X,Y the quadratic covariation process is 
given by 


t 
0 


IX, Y(t) = X(t)Y(t)—X(0)Y(0)— J X(s—)dY(s)— | Y (s—)dX (s). (8.24) 


This is also known as the integration by parts or product rule formula. 


8.6 Itô’s Formula for Continuous Semimartin- 
gales 


If X(t) is a continuous semimartingale and f is a twice continuously differ- 
entiable function, then Y(t) = f(X(t)) is a semimartingale and admits the 
following representation 


XO) -SAO | PEAXE | FAX Xs) (625) 
In differential form this is written as 
af(X(t)) = F(XW)AX() + SF"(XO)AX, XI). (826) 


It follows, in particular, that f(X(t)) is also a semimartingale, and its 
decomposition into the martingale part and the finite variation part can be 
obtained from It6’s formula by splitting the stochastic integral with respect 
to X(t) into the integral with respect to a local martingale M(t) and a finite 
variation process A(t). 

We have given a justification of Itô’s formula and examples of its use in 
Chapters 4 and 6. 
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Remark 8.1: 


1. The differentiability properties of f may be relaxed. If, for example, X 
is of finite variation, then f needs to be only once continuously differen- 
tiable. f can be defined only on an open set, rather than a whole line, 
but then X must take its values almost surely in this set. For example, 
if X is a positive semimartingale, then It6’s formula can be used with 
f =n. 


2. It6’s formula holds for convex functions (Protter (1992) p.163), and more 
generally, for functions which are the difference of two convex functions. 
This is the Meyer-It6 (It6-Tanaka) formula, see for example, Protter 
(1992) p.167, Rogers and Williams (1990), p.105, Revuz and Yor p.208. 
In particular, if f is a convex function on R and X(t) is a semimartin- 
gale, then f(X(t)) is also a semimartingale. See also Section 8.7. 


3. It follows from Itô’s formula that if a semimartingale X is continuous 
with nil quadratic variation |X, X](t) = 0, then the differentiation rule 
is the same as in the ordinary calculus. If X(t) is a Brownian motion, 
then d| X, X](t) = dt and we recover formulae (4.39) and (4.53). If X 
has jumps, then the formula has an extra term (see Section 8.10). 


The following result is a direct corollary to It6’s formula. 


Corollary 8.8 Let X(t) be a continuous semimartingale and f be twice con- 
tinuously differentiable. Then 


FOSO = f (PAY AX, x) (8.27) 


PROOF: Since [X, X] is of finite variation it follows from (8.25) 


F(X), FM) = | [ reonaxts, f FAX) (i). 


It6’s Formula for Functions of Several Variables 


Let f: R” — R be C?, and let X(t) = (Xi(t),..., Xn(t)) be a continuous 
semimartingale in R”, that is, each X; is a continuous semimartingale. Then 
f(X) is a semimartingale, and has the following representation. 
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8.7 Local Times 


Let X(t) be a continuous semimartingale. Consider |X(t) — a|, a € R. The 
function |x — a| is not differentiable at a, but at any other point its derivative 
is given by sign(a — a), where sign(z) = 1 for x > 0 and sign(x) = —1 for 
x <0. It is possible to extend It6’s formula for this case and prove (see Rogers 
and Williams (1990), p.95-102, Protter (1992) p.165-167) 


Theorem 8.9 (Tanaka’s Formula) Let X(t) be a continuous semimartin- 
gale. Then for anya E€ R there exists a continuous non-decreasing adapted 
process L° (t), called the local time at a of X, such that 


|X(t) — a] =|X(0) — a| + f sign(X(s) — a)dX(s) + L(t). (8.29) 


As a function in a, L° (t) is right-continuous with left limits. For any fixed 
a as a function in t L°(t) increases only when X(t) = a, that is, L(t) = 
h I(X(s) = a)dL*(s). Moreover, if X(t) is a continuous local martingale, 
then L° (t) is jointly continuous in a and t. 


Remark 8.2: Heuristically Tanaka’s formula can be justified by a formal 
application of Itô’s formula to the function sign(x). The derivative of sign(x) 
is zero everywhere but at zero, where it is not defined. However, it is possible 
to define the derivative as a generalized function or a Schwartz distribution, 
in which case it is equal to 26. Thus the second derivative of |x — a| is 6(a— a) 
in the generalized function sense. The local time at a of X is defined as 
L(t) = h 6(X(s) — a)ds. Formal use of Itô’s formula gives 8.29. 


Theorem 8.10 (Occupation Times Formula) Let X(t) be a continuous 
semimartingale with local time L° (t). Then for any bounded measurable func- 
tion g(x) 


ip XAXA) = f ~ g(a)L*(é)da. (8.30) 
In particular R 
[X, X](t) = i L*(t)da. (8.31) 


Example 8.10: Let X(t) = B(t) be Brownian motion. Then its local time at zero 
process, L°(t) satisfies (Tanaka’s formula) 


L(t) =|B(t)| — J sign(B(s))dB(s). (8.32) 
0 


8.7. LOCAL TIMES 223 


The occupation times formula (8.30) becomes 


f aaenas= f gla)L° (t)da. (8.33) 
0 = 


co 


The time Brownian motion spends in a set A C R up to time t is given by (with 
g(x) = Ia (x)) 


t co 
J Ia(B(s))ds = Ta(a)L*(t)da = i) L° (t)da. (8.34) 
0 —oo A 

Remark 8.3: Taking A = (a,a + da) and g(x) = I(a,a4da)(#) its indicator in 
(8.33), L° (t)da is the time Brownian motion spends in (a, a+ da) up to time t, 
which explains the name “local time”. The time Brownian motion spends in 
a set Ais f, L*(t)da, therefore the name “occupation times density” formula 
(8.34). For a continuous semimartingale the formula (8.30) is the “occupation 
times density” formula relative to the random “clock” d[X, X](s). 


Example 8.11: X(t) = |B(t)| is a semimartingale, since |x| is a convex function. 
Its decomposition into the martingale and finite variation parts is given by Tanaka’s 
formula (8.32). ,/|B(t)| is not a semimartingale, see Protter (1992), p.169-170. 


+ is important in financial application, as it 


+ 


Example 8.12: The function (x — a) 
gives the payoff of a financial stock option. The Meyer-Tanaka’s formula for (x — a) 


(X(t) — a)* = (X(0) —a)t 4 I I(X(s) > a)AX(s) + 512. (8.35) 
0 


Theorem 8.11 Let L°(t) be the local time of Brownian motion at a, and fila) 
the density of N(0,t) at a. Then 
dE(L*(t)) 


eao) = | fs(a)da hence r ala) (8.36) 


ProoFr: Taking expectation in both sides of equation (8.33) and changing 
the order of integration, we obtain for any positive and bounded g 


T g(a) fs(a)dsda = (a g(a)E(L°(t))da. 


—cCo =00 


The result follows, since g is arbitrary. 


A similar result can be established for continuous semimartingales by using 
equation (8.30) (e.g. Klebaner (2002)). 


Remark 8.4: Local times can also be defined for discontinuous semimartin- 
gales. For any fixed a, L°(t) is a continuous non-decreasing function in t, and 
it increases only at points of continuity of X where it is equal to a, that is, 
X(t—) = X(t) =a. The formula (8.30) holds with quadratic variation |X, X] 
replaced by its continuous part |X, X]°, see for example, Protter (1992), p.168. 
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8.8 Stochastic Exponential 


The stochastic exponential (also known as the semimartingale, or Doléans- 
Dade exponential) is a stochastic analogue of the exponential function. Recall 
that if f(t) is a smooth function then g(t) = ef is the solution to the dif- 
ferential equation dg(t) = g(t)df(t). The stochastic exponential is defined as 
a solution to a similar stochastic equation. The stochastic exponential of It6 
processes was introduced in Section 5.2. For a semimartingale X, its stochastic 
exponential €(X)(t) = U(t) is defined as the unique solution to the equation 


U(t) = + f U(s—)dX(s) or dU(t) = U(t—)dX(t); with U(0) = 1. (8.37) 
0 


As an application of It6’s formula and the rules of stochastic calculus we prove 


Theorem 8.12 Let X be a continuous semimartingale. Then its stochastic 
exponential is given by 


U(t) = €(X)(t) = eX O-XO)- a1 X10), (8.38) 
PROOF: Write U(t) =e”, with V(t) = X(t) — X(0) — 4[X, X(t). Then 
1 
dU (t) = d(eY) = eM dv(t) + 50 dl, V\(t). 


Using the fact that [X, X](t) is a continuous process of finite variation, we 
obtain [X, |X, X]](t) = 0, and [V, V](t) = [X, X](#). Using this, we obtain 


dU(t) = eV ax (2) - se al, X](t) + se al, X](t) = eV axa), 


or dU(t) = U(t)dX(t). Thus U(t) defined by (8.38) satisfies (8.37). To show 
uniqueness, let V(t) be another solution to (8.37), and consider V (t)/U (t). By 
integration by parts 


E) = Vida) + pg O + dv EI). 
By Itô’s formula, using that U(t) is continuous and satisfies (8.37) 
1 1 1 
day? = THe) + TH X(t), 
which leads to 
vO V® V(t) V(t) V(t) = 
ay) = tae + ays) + ay X](t) — Ta” X](t) = 0. 


Thus V(t)/U(t) = const. = V(0)/U(0) = 1. 


Properties of the stochastic exponential are given by the following result. 
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Theorem 8.13 Let X and Y be semimartingales on the same space. Then 
1. E(X)(LE(Y)(t) = E(X + Y+ [X,Y A) 
2. If X is continuous, X(0) =0, then (E(X)(t))~? = E(—X + [X, X]) (2). 
The proof uses the integration by parts formula and is left as an exercise. 


Example 8.13: (Stock process and its Return process.) 

An application in finance is provided by the relation between the stock process and 
its return process. The return is defined by dR(t) = dS(t)/S(t—). Hence the stock 
price is the stochastic exponential of the return, dS(t) = S(t—)dR(t), and S(t) = 
S(O)E(R)(t). 


Stochastic Exponential of Martingales 


Stochastic exponential U = €(M) of a martingale, or a local martingale, M(t) 
is a stochastic integral with respect to M(t). Since stochastic integrals with 
respect to martingales or local martingales are local martingales, E(M) is a 
local martingale. In applications it is important to have conditions for €(M) 
to be a true martingale. 


Theorem 8.14 (Martingale exponential) Let M(t),0<t<T <œ bea 
continuous local martingale null at zero. Then its stochastic exponential E(M) 
is given by eM()—-3IM.Ml(t) and it is a continuous positive local martingale. 
Consequently, it is a supermartingale, it is integrable and has a finite non- 
increasing expectation. It is a martingale if any of the following conditions 


hold. 
1. E (eM (T)-3[M,M](T)) =]. 


2. For allt > 0, E (Ch e?M(s)—[M.M](s) dM, M\(s)) < o. 


3. For allt > 0, E t e2M(5)d[M, M\(s)) < oO. 


Moreover, if the expectations above are bounded by K < co, then E(M) is a 
square integrable martingale. 


Proor: By Theorem 8.12, €(M)(t) = e@®-2|MMI®, therefore it is pos- 
itive. Being a stochastic integral with respect to a martingale, it is a local 
martingale. Thus €(M) is a supermartingale, as a positive local martingale, 
see Theorem 7.23. A supermartingale has a non-increasing expectation, and is 
a martingale if and only if its expectation at T is the same as at 0 (see Theorem 
7.3). This gives the first condition. The second condition follows from Theo- 
rem 7.35, which states that if a local martingale has finite expectation of its 
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quadratic variation, then it is a martingale. So that if E[E(MW), E(M)|(t) < œ, 
then €(M) is a martingale. By the quadratic variation of an integral 


[€(M), €(M)|(t) = f i e2M(s)-[M.M](s) dM, M](s). (8.39) 


The third condition follows from the second, since [M, M] is positive and 
increasing. The last statement follows by Theorem 7.35, since the bound 
implies supp<; E[E(M),E(M)] Œ) < œ. 


Theorem 8.15 (Kazamaki’s condition) Let M be a continuous local mar- 
tingale with M(0) = 0. If ce?“ is a submartingale, then E(M) is a martin- 
gale. 


The result is proven in Revuz and Yor (1999), p. 331. 
Theorem 8.16 Let M be a continuous martingale with M(0) = 0. If 


E (esm) < œ, (8.40) 


then E(M) is a martingale on [0, T]. 


PROOF: By Jensen’s inequality (see Exercise (7.3)) if g is a convex function, 
and Elg(M(t)| < œ for t < T, then Eg(M(t)) < Eg(M(T)) and g(M(t)) 
is a submartingale. Since e*/? is convex, the result follows by Kazamaki’s 
condition. 


Theorem 8.17 (Novikov’s condition) Let M be a continuous local mar- 
tingale with M(0) =0. Suppose that for eacht <T 


B(etlmlo ) < o. (8.41) 
Then E(M) is a martingale with mean one. In particular, if for each t there is 
a constant K; such that |M, M](t) < K+, then E(M)(t), t < T is a martingale. 


PROOF: The condition (8.41) implies that [M, M](t) has moments. By the 

BDG inequality (7.41), sup;<r M(t) is integrable, therefore M(t) is a mar- 

tingale. Next, using the formula for E(M) and Jensen’s inequality, EV X < 
E(X), 


Bled) = Ey E(M)(T)eb MMO) < [EE(M)(T)E (eMM) < oo, 
(8.42) 
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The last expression is finite since EE(M)(T) < co, as €(M) is a supermartin- 
gale, and by the condition of the theorem. This shows that the condition 
(8.40) holds and the result follows. 


Another proof can be found in Karatzas and Shreve (1988), p.198. 
Now we can complete the missing step in the proof of Levy’s theorem on 
characterization of Brownian motion (Theorem 7.36). 


Corollary 8.18 If M(t) is a continuous local martingale with |M, M](t) = t, 
then U(t) = E(uM)(t) = e™(—“*t/2 is a martingale. 


PROOF: Clearly uX (t) is a continuous local martingale with quadratic vari- 
ation u7t. The result follows by Novikov’s condition. 

However, it is possible to give a proof from first principles, by using the 
BDG inequality, and the the fact that under the condition (8.10) stochastic 
integrals with respect to martingales are martingales. Since U is a stochastic 
exponential it satisfies the SDE 


t 
U(t)=14+ | U(s)d(uM(s)). 
0 
A sufficient condition for a stochastic integral to be a martingale is finiteness 
of the expectation of its quadratic variation. Thus it is enough to check the 
condition 
T T 
ef U?’ (t)dļuM, uM](t) = we f U?(t)dt < œœ. 
0 0 


To see this use the BDG inequality with p = 2 (for U(t)—1 to have 0 at t = 0). 


E(U?(T) — 1) < E(sup U (t))? — 1 < CE[U,U](T) = cf E(U?(t))dt. 
t<T 0 


Let h(t) = E(U?(t)). Then 
ATS cal h(t)dt. 
0 


Gronwall’s inequality (Theorem 1.20) implies that h(T) < e°? < oo. Thus 
E(U?(T)) < co, and U(t) is a martingale. 


The next section gives more information and tools for processes with jumps. 
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8.9 Compensators and Sharp Bracket Process 


A process N is called increasing if all of its realizations N (t) are non-decreasing 
functions of t. A process N is of finite variation if all of its realizations N (t) are 
functions of finite variation, Vy (t) < co for all t, where Vy(t) is the variation 
of N on [0,t]. 


Definition 8.19 An increasing process N, t > 0, is called integrable 
if suP;>o EN (t) < œ. 

A finite variation process N is of integrable variation if its variation process 
is integrable, sup,>o EVN (t) < œœ. 

A finite variation process N is of locally integrable variation if there is a 
sequence of stopping times Tn such that Tn T co so that N(t^Tn) is of integrable 
variation, that is, sup;»o EVy (t A Tn) < œ. 


Example 8.14: A Poisson process N (t) with parameter A is of finite but not inte- 
grable variation, since for any t, Vn (t) = N(t) < 00, but sup,>o EVx (t) = œo. It is 
of locally integrable variation, since sup,>o EVxy (t A n) = An < œœ. Here Tn =n. 


Example 8.15: It can be seen that a finite variation process N (t) with bounded 
jumps |AN(t)| < c is of locally integrable variation. If Tn = inf{t : Vn (t) > n}, then 
N(t A 7) has variation bounded by n + c. Tn are stopping times, as first times of 
boundary crossing. 


Definition 8.20 Let N(t) be an adapted process of integrable or locally inte- 
grable variation. Its compensator A(t) is the unique predictable process such 
that M(t) = N(t) — A(t) is a local martingale. 


Existence of compensators is assured by the Doob-Meyer decomposition. 


Theorem 8.21 Let N(t) be an adapted process of integrable or locally inte- 
grable variation. Then its compensator exists. Moreover, it is locally inte- 
grable. 


PROOF: As a finite variation process is a difference of two increasing pro- 
cesses, it is enough to establish the result for increasing processes. By localiza- 
tion it is possible to assume that it is integrable. But an increasing integrable 
process is a submartingale, and the result follows by the Doob-Meyer decom- 
position Theorem 8.4. 


Remark 8.5: The condition M = N — A is a local martingale is equivalent 
to the condition (see Liptser and Shiryayev (1989), p.33) 


ef H(s)dN(s) =E | H(s)dA(s), (8.43) 
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for any positive predictable process H. Sometimes this integral condition 
(8.43) is taken as the definition of the compensator, e.g. Karr (1986). 
The compensator of N is also called the dual predictable projection of N 
(Rogers and Williams (1990), p.350, Liptser and Shiryayev (1989), p.33). 
Note that the compensator is unique with respect to the given filtration and 
probability. If the filtration or probability are changed, then the compensator 
will also change. 


Recall that the quadratic variation process [X, X](t) of a semimartingale 
X exists and is non-decreasing. Consider now semimartingales with integrable 
(sup:>0 E[X, X](t) < œœ) or locally integrable quadratic variation. 


Definition 8.22 The sharp bracket (or angle bracket, or predictable quadratic 
variation) (X,X)(t) process of a semimartingale X is the compensator of 
[X,X](t). That is, it is the unique predictable process that makes |X, X](t) — 
(X, X) (t) into a local martingale. 


Example 8.16: Let N be a Poisson process. It is of finite variation and changes 
only by jumps (pure jump process), which are of size 1, AN(t) = 0 or 1, and 
(AN(t))? = AN(t). Its quadratic variation is the process N(t) itself, 


IN, NIH = $ (AN(s))? = XD AN(s) = NU). 
O<s<t O<s<t 


Clearly supp<;<r E[N, N](t) = T. Thus N is of integrable variation on [0, T]. t is 
non-random, therefore predictable. Since [N, N](t) — t = N(t) — t is a martingale, 


(N,N) (t) =t 


Example 8.17: Let B be a Brownian motion. Its quadratic variation is [B, B](t) = 
t, and since it is non-random, it is predictable. Hence (B, B} (t) = t, and the mar- 
tingale part in the Doob-Meyer decomposition of |B, B](t) is trivial, M(t) = 0. 


The last example generalizes to any continuous semimartingale. 


Theorem 8.23 If X(t) is a continuous semimartingale with integrable 
quadratic variation, then (X, X) (t) = [X,X](t), and there is no difference 
between the sharp and the square bracket processes. 


Proor: The quadratic variation jumps at the points of jumps of X and 
A[X, X](s) = (AX(s))?. Since X has no jumps, [X, X](t) is continuous. 
[X, X](t) is predictable as a continuous and adapted process, the martingale 
part in the Doob-Meyer decomposition of |X, X](t) is trivial, M(t) = 0, and 
(X, X) (t) = [X, X](0). 


Example 8.18: Let X be a diffusion solving the ce 
dX(t) = w(X(t))dt + o(X(t))dB(t). Then [X, X](t) = fra? ))ds = (X, X) (t). 
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Sharp Bracket for Square Integrable Martingales 
Let M be a square integrable martingale, that is, sup, EM?(t) < oo. Recall 
that the quadratic variation of M has the property 

M?(t) — [M, M] (t) is a martingale. (8.44) 


M? is a submartingale, since x? is a convex function. Using the Doob-Meyer 
decomposition for submartingales, Theorem 8.4, we can prove the following 


Theorem 8.24 Let M be a square integrable martingale. Then the sharp 
bracket process (M, M) (t) is the unique predictable increasing process for which 


M?(t) — (M, M) (t) is a martingale. (8.45) 


PROOF: By the definition of the sharp bracket process, |M, M](t)—(M, M} (t) 
is a martingale. As a difference of two martingales, M?(t) — (M, M) (t) is also 
a martingale. Since (M, M) (t) is predictable and M?(t) is a submartingale, 
uniqueness follows by the Doob-Meyer decomposition. 


By taking expectations in (8.45) we obtain a useful corollary. 


Corollary 8.25 Let M(t) be a square integrable martingale. Then 
E(M?(T)) = E[M, M(t) = E (M, M) (8). (8.46) 


This result allows us to use Doob’s martingale inequality with the sharp 
bracket 
E((sup M(s))?) < 4E(M?(T)) =4E((M,M)(T)). (8.47) 
s<T 


By using localization in the proof of Theorem 8.24 one can show 


Theorem 8.26 Let M be a locally square integrable martingale, then the pre- 
dictable quadratic variation (M,M) (t) is the unique predictable process for 
which M?(t) — (M, M) (t) is a local martingale. 


Next result allows to decide when a local martingale is a martingale by using 
the predictable quadratic variation. 


Theorem 8.27 Let M(t),0<t<T <œ, be a local martingale such that for 
all t, E(M, M) (t) < œ. Then M is a square integrable martingale, moreover 
EM?(t) = E[M, M](t) = E (M, M) (t). If T = œ, and sup,.,, E (M, M) (t) < 
oo, then M(t) is a square integrable martingale on [0, co). 
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PROOF: [M, M](t) — (M, M) (t) is a local martingale. Let 7, be a lo- 
calizing sequence. Then E[M, M](t A 7m) = E(M,M)(t At). Since both 
sides are non-decreasing, they converge to the same limit as n — oo. But 
limpsoo E (M, M} (t A Tn) = E(M,M)(t) < œ. Therefore E[M, M](t) = 
E (M, M) (t) < œ. Thus the conditions of Theorem 7.35 are satisfied and the 
result follows. 


Since a continuous local martingale is locally square integrable, we obtain 


Corollary 8.28 The sharp bracket process (predictable quadratic variation) 
for a continuous local martingale exists. 


Continuous Martingale Component of a Semimartingale 


A function of finite variation has a decomposition into continuous and discrete 
parts. A semimartingale is a sum of a process of finite variation and a local 
martingale. It turns out that one can decompose any local martingale into a 
continuous local martingale and a purely discontinuous one. Such decomposi- 
tion requires a different approach to the case of finite variation processes. 


Definition 8.29 A local martingale is purely discontinuous if it is orthogonal 
to any continuous local martingale. Local martingales M and N are orthogonal 
if MN is a local martingale. 


Example 8.19: A compensated Poisson process N (t) = N(t) — t is a purely dis- 
continuous martingale. Let M(t) be any continuous local martingale. Then by the 
integration by parts formula (8.24) 


o= f me- )dN(s j+ fe )dM(s) + [M, N](t). 


Since : . of finite variation, by the property (8.19) of quadratic covariation 

[M, N]( = ect AM(s)AN(s). But M is continuous, AM(s) = 0, and 

[M,N ic z = 0. Therefore, M(t) N (t) is a sum of two stochastic integrals with respect 
to local martingales, and itself is a local martingale. 


It is possible to show that any local martingale M has a unique decompo- 
sition 
M=M°+ M4, 
where M° is a continuous and M? a purely discontinuous local martingale 
(see for example Liptser and Shiryayev (1989), p.39, Protter (1992), Jacod 
and Shiryaev (1987)). 
If X is a semimartingale with representation 


X(t) = X(0) + M(t) + A(t), (8.48) 
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with a local martingale M, then M° is called the continuous martingale com- 
ponent of X and is denoted by X°”. Even if the above representation of X 
is not unique, the continuous martingale component of X is the same for all 
representations. Indeed, if X(t) = X(0) + Mı (t) + Aı (t), is another represen- 
tation, then (M — M;)(t) = —(A— Aı )(t). Hence (M — M1) is a martingale of 
finite variation. Hence its continuous component is also a martingale of finite 
variation. But a continuous martingale of finite variation is a constant. This 
implies that M° — Mp = 0. Thus X°” = M°“! is the same for all representa- 
tions. If X is of finite variation then the martingale part is zero and by the 
uniqueness of X°” we have 


Corollary 8.30 If X is a semimartingale of finite variation, then X°” = 0. 


For example, the compensated Poisson process N(t) — t has zero continuous 
martingale component. 
It can be shown that 


(X Xo™) = [XX]; (8.49) 


where [X, X]° is the continuous part of the finite variation process |X, X]. Of 
course, because X°” is continuous (X, X°™) = [xe™, X]. 

Let AX(s) = X(s) —X(s—) and put X(0—) = 0 and [X, X]°(0) = 0. Since 
the jumps of quadratic variation satisfy (see (8.18)) 


A[X, X](s) = (AX(s))’, 


[X,X](t) = [X,X]*(t)+ $ A, X](s) =[X, XP) + SS (AX(s))?, 
O<s<t O<s<t 
= (x, xm + S$” (AX(s))?. (8.50) 
O<s<t 


Since the quadratic variation [X, X] for a semimartingale exists, and the pre- 
dictable the quadratic variation (X°”, X°™) exists (Corollary 8.28) we obtain 


Corollary 8.31 If X is a semimartingale then for each t 


S 7 (AX(s))? < œ. (8.51) 


s<t 


Conditions for Existence of a Stochastic Integral 


The class of processes H for which the stochastic integral with respect to a 
martingale M can be defined depends in an essential way on the properties 
of the predictable quadratic variation (M, M} of M. Consider integrals with 
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respect to a locally square re martingale M, possibly discontinuous. 
The stochastic integral JH s)dM (s) can be defined for predictable processes 
H such that 


F H?’ (t)d (M, M) (t) < 00, (8.52) 
0 


and in this case the integral IM H(s)dM(s), 0 < t <T, is a local martingale. 
The class of processes H that can be integrated against M is wider when 
(M, M) (t) is continuous, and even wider when (M, M) (t) is absolutely con- 
tinuous (can be represented as an integral with respect to dt). These classes 
and conditions are given, for example, in Liptser and Shiryaev (2001), p. 191. 


Example 8.20: Let filtration F be generated by a Brownian motion B(t) and 
a Poisson process N (t). The process N(t—) is a left-continuous modification of 
N(t). By definition, N(t—) = limst: N(s). Being left-continuous it is predictable. 
Condition (8.52) holds. The integral fs N(s—)dB(s) is a well defined a stochastic 
integral of a predictable process with respect to a martingale B. 


Properties of the Predictable Quadratic Variation 


The predictable quadratic variation (the sharp bracket process) has similar 
properties to the quadratic variation (the square bracket) process. We list them 
without proof. All the processes below are assumed to be semimartingales with 
locally integrable quadratic variation. 


1. (X, X) (t) is increasing in t. 
2. (X,Y) is bilinear and symmetric, 


(aX + Y,BU + V) = aB(X,U)+a(X,V)+8(Y,U) +(Y,V). (8.53) 


. Polarization identity. (X,Y) = 4((X +Y,X+Y)-(X-Y,X-Y)). 


. (X,Y) is a predictable process of finite variation. 


. (X,Y) = 0 if X or Y is of finite variation and one of them is continuous. 


O oH FF WwW 


. The sharp bracket process of stochastic integrals (H - X, K - Y) (t). 


(fH) \dX(s ). [ x )dY(s W= f mK d (X,Y) (s), 


(8.54) 
in particular ( Jo H(s)aX(s), f a = fi H?(s)d (X, X)(s), 
( fy H(8)dX (8), Y )(t) = ( fy H(9)dX, fo dY) (8) = f H(s)d (X,Y) (8). 
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Recall that stochastic integrals with respect to local martingales are again 
local martingales. Using the sharp bracket of the integral together with The- 
orem 8.27 we obtain 


Theorem 8.32 Let M(t),0<t<T be a local martingale and H(t) be a pre- 
dictable process such that B( i H?(s)d(M, M) (s)) <œ. Then JH (s)dM(s) 


is a square integrable martingale, moreover 


( f moam, f HOME) O = f H%(s)a a,n) (s). (8.55) 


Using this result, we obtain the isometry property for stochastic integrals in 
terms of the sharp bracket process. 


B( f noame) = e( f H?(s)d(M, M) (s)) (8.56) 


Example 8.21: Let M(t) be the compensated Poisson ae M(t) = N(t)—t, and 
H be predictable, satisfying E Je H? (t)dt < oo. Then fH (s)dM(s) is a martingale, 


moreover 
ef no )dM(s (9)=0, ana B( fH) )dM(s )) =E f mias 


8.10 Itô’s Formula for Semimartingales 


Let X(t) be a semimartingale and f be a C? function. Then f(X(t)) is a 
semimartingale, and It6’s formula holds 


IX(t)) — f(X(0)) = f f'(X(s-))dX sf f'"(X(s—))d|X, X](s) (8.57) 


+D (FO) - AX- - PX Da, 


s<t 


The quadratic variation [X,X] jumps at the points of jumps of X and its 
jumps A[X, X](s) = (AX(s))?. Thus the jump part of the integral 


Je f" (X(s—))d[X, X](s) is given by X p<; f” (X(s—))(AX(s))?, leading to an 
equivalent form of the formula = 


A(X) — F(X) = "| f'(X(s-))dX +5 fr #"(X(s—))d[X, X}*(s) 
+E (#(X(s)) -= F(X (s-)) = F(X (s-)) AX a (8.58) 


s<t 
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where [X,X]° is the continuous component of the finite variation function 
[X, X]. Using the relationship between the square and the sharp brackets 
(8.50), we can write Itô’s formula with the sharp bracket process of X, provided 
the sharp bracket exists, 


IXO) = XO) = f PAX [eaa x) 
oe I(X(s-)) = F'(X (s ae 


where X°™” denotes the continuous martingale part of X. 


Example 8.22: Let N(t) be a Poisson process. We calculate f N(s—)dN(s). The 
answer can be derived from the integration by parts formula (8.24), but now we use 
(8.59). Since (N(t) — t)°” = 0 (by Corollary 8.30) 


ney a2 fw N(s—)dN(s (3) + > (NG) N?(s—) — 2N(s JAN(s)). 


s<t 


Since N(s) = N(s—) + AN(s), 
(N(s—) + AN(s))? — N?(s—) — 2N(s—)AN(s) = (AN(s))? = AN(s), and the sum 
simplifies to X` <, AN(s) = N (t). Thus we obtain 


J N(s—)dN(s) = (N?) — N(t)). 
0 


A formula (8.59) for a function of n variables reads: X(t) = (X1(t),..., X” (t)) 
is a semimartingale and f is a C? function of n variables, 


t 92 | | 
4 x, ~~ (X(s—))d (X5, XIM) (5) 


PE [sexy 1x Lael 


Itô’s formula can be found in many texts, Protter (1992), p. 71, Rogers and 
Williams (1990), p. 394, Liptser and Shiryayev (1989), Métivier (1982), Del- 
lacherie and Meyer (1982). 


)AX*(s ) (8.60) 
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8.11 Stochastic Exponential and Logarithm 


As an application of It6’s formula and the rules of stochastic calculus we outline 
a proof of the following result. 


Theorem 8.33 Let X be a semimartingale. Then the stochastic equation 


U(t)=1+4+ iE U(s—)dX (s). (8.61) 


has a unique solution, called the stochastic exponential of X, and this solution 
is given by 


U(t) = E(X)(t) = e* O-XO-3% XO TT (1+ AX(s))e-4*. (8.62) 


s<t 
Formula (8.62) can be written by using quadratic variation as follows 


E(X)(t) = eX O-XO- 3X1) IIc + AX(s))e(-4*())+2(AX(8))”_ (8.63) 


s<t 


Proor: Let Y(t) = X(t) — X(0) — 5 (X, XY (t) and 

V(t) = [s< (14+ AX(s))e74*). Note that although the product is taken for 
all s < t, there are at most countably many points at which AX(s) 4 0, (by the 
regularity property of the process), hence there are at most countably many 
elements different from 1 in the product. We show that the product converges. 
Since by (8.51) 5>,-,(AX(s))? < oo, there are only finitely many points s at 
which |AX(s)| > 0.5, which give a finite non-zero contribution to the product. 
Taking the product with over s at which |AX(s)| < 1/2, and taking logarithm, 
it is enough to show that $- <, | n(1+ AX(s)) — AX(s)| converges. But this 
follows from the inequality |In(1 + AX(s)) — AX(s)| < (AX(s))? by (8.51). 
To see that U(t) defined by (8.62) satisfies (8.61) use Itd’s formula applied to 
the function f(Y (t), V(t)) with f(v1,22) = e”tx2. For the uniqueness of the 
solution of (8.61) and other details see Liptser and Shiryayev (1989), p.123. 


Example 8.23: The stochastic exponential (8.62) of a Poisson process is easily 
seen to be €(N)(t) = 2%. 

If U = E(X) is the stochastic exponential of X, then X = L(V) is the stochas- 
tic logarithm of U, satisfying equation (8.61) 

_ a(t) 


aX(t) = Tay 


or L(E(X)) = X. 


For It6 processes an expression for X(t) is given in Theorem 5.3, for general 
case see Exercise 8.17. 
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8.12 Martingale (Predictable) Representations 


In this section we give results on representation of martingales by stochastic 
integrals of predictable processes, also called predictable representations. Let 
M(t) be a martingale, 0 < t < T, adapted to the filtration F = (F+), and H(t) 
be a predictable process satisfying Je H?(s)d (M, M) (s) < co with probability 
one. Then fe H(s)dM(s) is a local martingale. The predictable representation 
property means that the converse is also true. Let F” = (FM) denote the 
natural filtration of M. 


Definition 8.34 A local martingale M has the predictable representation prop- 
erty if for any F” -local martingale X there is a predictable process H such 
that 


X(t) = X(0) + i H(s)dM(s). (8.64) 


This definition is different to the classical one for martingales with jumps, see 
Remark 8.7 below, but is the same for continuous martingales. 

Brownian motion has the predictable representation property (see, for ex- 
ample Revuz and Yor (1999) p. 209, Liptser and Shiryayev (2001) I p. 170). 


Theorem 8.35 (Brownian Martingale Representation) 
Let X(t), O < t < T, be a local martingale adapted to the Brownian fil- 
tration F? = (F,). Then there exists a predictable process H(t) such that 


i. H?(s)ds < œ with probability one, and equation (8.65) holds 
t 
X(t) = X(0) + i, H(s)dB(s). (8.65) 
0 


Moreover, if Y is an integrable Fr-measurable random variable, E|Y| < co, 
then 


Y =EY + T H(t)dB(t). (8.66) 
0 


If in addition, Y and B have jointly a Gaussian distribution, then the process 
H(t) in (8.66) is deterministic. 


PROOF: We don’t prove the representation of a martingale, but only the 
representation for a random variable based on it. 

Take X(t) = E(Y|F;). Then X(t), 0 < t < T, is a martingale (see Theorem 
7.9). Hence by the martingale representation there exists H, such that 

X(t) = X(0)+ iG H(s)dB(s). Taking t = T gives the result. 
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Remark 8.6: A functional of the path of the Brownian motion Bjo,7) is a 
random variable Y, Fr-measurable. Theorem 8.35 states that under the above 
assumptions, any functional of Brownian motion has the form (8.66). 


Since It6 integrals are continuous, and any local martingale of a Brownian 
filtration is an Ito integral, it follows that all local martingales of a Brownian 
filtration are continuous. In fact we have the following result (the second 
statement is not straightforward) 


Corollary 8.36 


1. All local martingales of the Brownian filtration are continuous. 
2. All right-continuous adapted processes are predictable. 


Corollary 8.37 

Let X(t),0<t < T, be a square integrable martingale adapted to the Brownian 
piranon F. Then there exists a predictable process H(t) such that 

EJE H?(s)ds < œ and representation (8.65) holds. Moreover, 


-f H(s)ds, and H(t) = AO (8.67) 
0 


The equation (8.67) follows from (8.65) by the rule of the sharp bracket for 
integrals. 


Example 8.24: Ba of martingales) 
1. X(t) = B?(t)—t. Then X(t =a 2B(s)dB(s). Here H(t) = 2B(t), which can 
also be found by using (8.67). 
2. Let X(t) = f(B(t),t) be a martingale. By It6’s formula dX (t) = Sf (B(t), t)dB(t). 
Thus H(t) = of +(B(t),t). This also shows that 

(f(B,t), B) ) _ af 

ie a 
Example 8.25: (Representation of random variables) 
1. If Y = ff B(s)ds, then Y = Ig T — s)dB(s). 
2. Y = B7(1). Then M(t) = E(B )|Fz) = B?(t) + (1 — t). Using Itô’s formula for 
M(t) we obtain B?(1) = 1 + rae (t)dB(t). 


Similar results hold for the Poisson process filtration. 


Theorem 8.38 (Poisson Martingale Representation) 
Let M(t), 0<t< T, be a local martingale adapted to the Poisson filtration. 
Then there exists a predictable process H(t) such that 


M(t) = M(0) + I H(s)dÑ (8), (8.68) 


where N (t) = N(t) — t is the compensated Poisson process. 
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When a filtration is larger than the natural filtration of a martingale, then there 
is the following result (Revuz and Yor (1999) p. 209, Liptser and Shiryaev 
(2001) p. 170). 


Theorem 8.39 If M(t), 0 <t< T, is any continuous local martingale, and 
X a continuous F” -local martingale. Then X has a representation 


X(t) = X(0)+ f ' H(s)dM(s) + Z(t), (8.69) 


where H is predictable and (M, Z) = 0; (consequently (X — Z, Z) = 0). 


Example 8.26: F be a by two independent Brownian motions B 
and W, and let M(t ah W (s . It is a martingale, as a stochastic integral 


satisfying Ef, W?(s ” < œ. a that M does not have the predictable 
aa ee 
(M, M) =o W?(s)ds. Hence W?(t) = HERHOR which shows that W?(t) is FM- 
non Hence the martingale X(t) = W? (t) — t is adapted to Fj“, but is not 
an ae of a predictable process with joe to M. = Itô’s formula 
X(t) = 2 f W(s) (s)dW (s 5 Hence a M) = W?(s)d(W, B) (s) = 0. Suppose 
ae is H, such that X(t =f H . Then by (8.67) 
H(t) = KXM) =0, oe 2 on A = T which is a contradiction. 

This example has an application in Finance, it shows non-completeness of a 
stochastic volatility model. 


Example 8.27: Let F be generated by a Brownian motions B and a Poisson 
process N, and let M(t) = B(t) + N(t) — t = B(t) + N (t), where N(t) = N(t) — t. 
M is a martingale, as a sum of two martingales. We show that M does not have the 
predictable representation property. 

[M, M](t) = [B, B](t) + [N, N] (t) = t + N (t). This shows that N(t) = [M, M](t) — 
is Cie es Hence es F a — N (t) + t is F/“-measurable. Thus the 


martingale X(t =f N(s is FM -measurable, but it does not have a pe 
dictable pa If it aa . d N(s— = fiH y+ fH ò 

and J (N(s—) — H(s))dB( =f H(s)dN(s). m : ‘ws on i H : - 
finite variation, H (s F N Me S for r all s. Thus fH h s)dN(s) = 0. This is 
the same as K N(s =f N —)ds. But this is seen To see the con- 


tradiction, let t = T2 a time of the see jump of N. Then i N(s—)dN(s) = 
and jo N(s—)ds = To — T. 

This example shows non-completeness of models of stock prices with jumps. It 
can be generalized to a model that supports a martingale with a jump component. 


Remark 8.7: The Definition 8.34 of the predictable representation property 
given here agrees with the standard definition given for continuous martingales 
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but is different to the definition for predictable representation with respect 
to semimartingales, given in Liptser and Shiryaev (1989) p.250, Jacod and 
Shiryaev (1987) p.172, Protter (1992) p.150. The general definition allows for 
different predictable functions h and H to be used in the integrals with respect 
to the continuous martingale part M° and the discrete martingale part M4 of 
M 


? 


xO=xO+ f a s)dM“(s j+ fms )dM“(s 


In this definition, the martingale M in Example 8.27 has the predictable rep- 
resentation property. 

The definition given here is more suitable for financial applications. Ac- 
cording to the financial mathematics theory an option can be priced if it can 
be replicated, which means that it is an integral of a predictable process H 
with respect to the discounted stock price process M, which is a martingale. 
The process H represents the number of shares bought/sold so it does not 
make sense to have H consist of two different components. 


8.13 Elements of the General Theory 


The basic setup consists of the probability space (Q, F, P), where F is a o-field 
on Q and P is a probability on F. A stochastic process is a map from Rt x Q 
to IR, namely (t,w) + X(t,w). IR* has the Borel o-field of measurable sets, 
and F is the o-field of measurable sets on Q. Only measurable processes are 
considered, that is for any A € B 


{(t,w) : X(t,w) € A} E B(R*) x F 


Theorem 8.40 (Fubini) Let X(t) be a measurable stochastic process. Then 
1. P-a.s. the functions X(t,w) (trajectories) are (Borel) measurable. 
2. If EX(t) exists for all t, then it is measurable as a function of t. 
8. If J? E|X(8)|at < œ P-a.s. then almost all trajectories X(t) are inte- 
grable and i EX (t)dt = E f? X(t)dt 


Let F be a filtration of increasing o-fields on Q. Important classes of processes 
are introduced via measurability with respect to various o-fields of subsets of 
Rt x Q: adapted, progressively measurable processes, optional processes and 
predictable processes, given in the order of inclusion. 
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X is adapted if, for all t, X(t) is F; measurable. X is progressively measur- 
able if, for any t, {(s < t,w) : X(s,w) € A} € B([0,t]) x Fi. Any progressively 
measurable process is clearly adapted. It can be shown that any right or left 
continuous process is progressively measurable. 


Definition 8.41 


1. 


The o-field generated by the adapted left-continuous processes is called 
the predictable o-field P. 


2. The o-field generated by the adapted right-continuous processes is called 


the optional o-field O. 


3. A process is called predictable if it is measurable with respect to the pre- 


dictable o-field P; it is called optional if it is measurable with respect to 
the optional o-field O. 


Remarks 


1. 


The predictable o-field P is also generated by the adapted continuous 
processes. 


. Define Fy = o( User E) the smallest o-field containing F, for all s < t, 


and Fo- = Fo. Fı— represents the information available prior to t. Then 
the predictable o-field P is generated by the sets [s,t) x A with s < t 
and A E€ Fy_. 


. Predictable and optional o-fields are also generated respectively by sim- 


ple adapted left-continuous processes and simple adapted right-continuous 
processes. 


. Since a left-continuous adapted process H(t) can be approximated by 


right-continuous adapted processes, (H(t) = lime—~o H((t — €)+)), any 
predictable process is also optional. Therefore P C O. 


. The Poisson process is right-continuous and it can be shown that it 


cannot be approximated by left-continuous adapted processes. Therefore 
there are optional processes which are not predictable, P C O. 


. In discrete time optional is the same as adapted. 


Stochastic Sets 


Subset of IRt x Q are called stochastic sets. If A is a stochastic set, then “its 
projection on Q” ma = {w : Jt such that (t,w) € A}. A is called evanescent if 
P(m4) =0. 
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Two processes X(t) and Y(t) are called indistinguishable if the stochastic 
set A = {(t,w) : X(t,w) Æ Y(t,w)} is an evanescent set. A process indistin- 
guishable from zero is called P-negligible. 

If 7, and 72 are stopping times, a closed stochastic interval is defined as 
([71, 72]] = {(t,w): T1(w) <t < 72(w)}. Similarly half-closed [[71, raff, ]]71, 72]], 
and open |]71, 72[[ stochastic intervals are defined. Double brackets are used to 
emphasize that the intervals are subsets of IR* x Q and to distinguish them 
from intervals on R”. 

The stochastic interval [[7,7]] = {(t,w) : 7(w) = t} is called the graph of 
the stopping time 7. 

A stochastic set A is called thin if there are stopping times Tn, such that 
A = Uall]: 


Example 8.28: Let N(t) be a Poisson process with rate à, and A = {AN # 0}. 
Then A = Un|[Tm]], where Tn is the time of the n-th jump. Hence A is a thin set. 


It can be shown that for any regular right-continuous process the stochastic 
set of jumps {AX 4 0} is a thin set (Liptser and Shiryayev (1989), p.4). 


Classification of Stopping Times 


Recall that 7 is a stopping time with respect to filtration F if for all t > 0 
the event {r < t} € Fe. If F is right-continuous, then also {7 < t} € Fy. 
Events observed before or at time 7 are described by the o-field F+, defined as 
the collection of sets {A € F : for any t AN {T < t} € Fi}. Events observed 
before time 7 are described by the o-field F,_, the o-field generated by Fo 
and the sets AN {T >t}, where A € Fi, t > 0. 

There are three types of stopping times that are used in stochastic calculus: 


1. predictable stopping times, 

2. accessible stopping times, 

3. totally inaccessible stopping times. 

T is a predictable stopping time if there exists a sequence of stopping times 
Tn, n < T, and lim, mn = T. In this case it is said that the sequence Tn 
announces T. 


Example 8.29: If 7 is a stopping time then for any constant a > 0, 7+a isa 
predictable stopping time. Indeed, it can be approached by Tn = T +a — 1/n. 


Example 8.30: Let B(t) be Brownian motion started at zero, F its natural filtra- 
tion and 7 the first hitting time of 1, that is, 7 = inf{t : B(t) = 1}. 7 is a predictable 
stopping time, since Tn = inf{t : B(t) = 1 — 1/n} converge to 7T. 
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T is an accessible stopping time if it is possible to announce 7, but with dif- 
ferent sequences on different parts of Q, that is, [[7]] C Un[[tn]], where Tn are 
predictable stopping times. All other types of stopping times are called totally 
inaccessible. 


Example 8.31: Let N(t) be Poisson process, F its natural filtration and 7 the 
time of the first jump, 7 = inf{t : N(t) = 1}. 7 is a totally inaccessible stopping 
time. Any predictable stopping time Tn < 7 is a constant, since F:N{t < T} is trivial. 
But 7 has a continuous distribution (exponential), thus it cannot be approached by 
constants. 


The optional o-field is generated by the stochastic intervals {[0,7|[, where 
T is a stopping time. The predictable o-field is generated by the stochastic 
intervals [[0,7]]. 

A set A is called predictable if its indicator is a predictable process, T4 € P. 

The following results allow us to decide on predictability. 


Theorem 8.42 Let X(t) be a predictable process and T be a stopping time. 
Then 


1. X(T)I(T < œ) is Fr— measurable, 
2. the stopped process X(t ^T) is predictable. 
For a proof see, for example, Liptser and Shiryayev (1989) p. 13. 


Theorem 8.43 An adapted regular process is predictable if and only if for 
any predictable stopping time T the random variable X(T)I(T < oo) is Fr- 
measurable and for each totally inaccessible stopping time T one of the following 
two conditions hold 


1. X(T) = X(T—) on T < œ 
2. the set {AX #0} [[7]] is P-evanescent. 
For a proof see, for example, Liptser and Shiryayev (1989) p. 16. 


Theorem 8.44 A stopping time T is predictable if and only if for any bounded 
martingale M, E(M(T)I(T < œ)) = E(M(T—)I(T < 00)). 


Theorem 8.45 The compensator A(t) is continuous if and only if the jump 
times of the process X(t) are totally inaccessible. 


See, for example, Liptser and Shiryayev (2001) for the proof. 


Example 8.32: The compensator of the Poisson process is t, which is continuous. 
By the above result the jump times of the Poisson process are totally inaccessible. 
This was shown in Example 8.31. 
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It can be shown that for Brownian motion and diffusions any stopping time 
is predictable. This implies that the class of optional processes is the same as 
the class of predictable processes. 


Theorem 8.46 For Brownian motion filtration any martingale (local martin- 
gale) is continuous and any positive stopping time is predictable. Any optional 
process is also predictable, O = P. 


Similar result holds for diffusions (see for example, Rogers and Williams (1990), 
p.338). 


Remark 8.8: It can be shown that (X, X) is the conditional quadratic vari- 
ation of [X, X] conditioned on the predictable events P. 


8.14 Random Measures and Canonical Decom- 
position 

The canonical decomposition of semimartingales with jumps uses the concepts 

of a random measure and its compensator, as well as integrals with respect 

to random measures. We do not use this material elsewhere in the book. 

However, the canonical decomposition is often met in research papers. 

Random Measure for a Single Jump 


Let € be a random variable. For a Borel set A C R define 
p(w, A) = Ta(&(w)) = Iw) € A). (8.70) 


Then p is a random measure, meaning that for each w € Q, p(w, A) is a 
measure when A varies, A € B( R). Its (random) distribution function has a 
single jump of size 1 at €. The following random Stieltjes integrals consist of 
a single term: 


ie xp(w,dx) = €(w), and for a function h, Ir h(x) u(w, dx) = h(€(w)). 


(8.71) 
There is a special notation for this random integral 


he p= Ja h(x)u(dz). (8.72) 


8.14. RANDOM MEASURES AND CANONICAL DECOMPOSITION 245 


Random Measure of Jumps and its Compensator in Discrete Time 


Let Xo,...Xn,... be a sequence of random variables, adapted to Fn, and let 
En = AXy = Xn — Xn-1. Let un = IA (En) be jump measures, and let 


Yn(A) = E(un(A)|Fn—1) = EUW (En)|Fn—1) = Plên E€ AlFn-1) 


be the conditional distributions, n = 0,1,.... Define 


n 


u((0, n}, A) = Denil), and v((0,n], A) = X` vi(A). (8.73) 


i=l 


Then for each A the sequence u((0, n], A) — v((0,n], A) is a martingale. The 
measure j4((0, 7], A) is called the measure of jumps of the sequence Xn, and 
v((0,n], A) its compensator (A does not include 0). Clearly, the measure 
u = {u((0,n])}n>1 admits representation 


u=v+(u-v), (8.74) 


where v = {v((0, n])}n>1 is predictable, and u — v = {u((0,n])—v((0,n])}n>1 
is a martingale. This is Doob’s decomposition for random measures. With 
notation (8.72) 

Xn = (£ * u)n- (8.75) 


Regular conditional distributions exist and for a function h(x) the conditional 
expectations can be written as integrals with respect to these 


E(hlén)|Fa1) = I p POralda), 


provided h(n) is integrable, E|h(En,)| < co. 
Assume now that €, are integrable, then its Doob’s decomposition (8.4) 


n n 


Xn = Xo + X E(GlFi-a) + 9 (6 — E(G|Fi-1)) = Xo + An + Mn. (8.76) 


i=l isi 


Using random measures and their integrals, we can express An and Mp as 


An = > PEF) = DA xvi(dz) = (£ * V)n, (8.77) 


II 


Mn dé — E(Gi|Fi-1)) = De fpe) — vi(dx)) = (x x (u — v))n. 


Thus the semimartingale decomposition of X is given by using the random 
measure and its compensator 


Xn = Xo + (£ * v)n + (x * (u — v))n. (8.78) 
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However, the jumps of X, €, = AX,, may not be integrable. Then the 
term (x * V)n is not defined. In this case a truncation function is used, such 
as h(x) = xI(\|z| < 1), and a similar decomposition is achieved, called the 
canonical decomposition, 


n 


Xn = Xot YAH (€:)|Fi-1) JK E(E|Fi-1)) + D_(&i — hE) 


ae ree ene Or or (8.79) 


The above representation has well-defined terms and in addition it has another 
advantage that carries over to the continuous time case. 


Random Measure of Jumps and its Compensator 


Let X be a semimartingale. For a fixed t consider the jump AX(t). Taking 
€ = AX(t), we obtain the measure of the jump at t 


M,A) = TAXCO), with f pehd) = AX). (880) 
Now consider the measure of jumps of X (in IR* x R°, with R° = R \ 0) 


u((0, x A) = S> L4(AX(s)) (8.81) 


O<s<t 


for a Borel set A that does not include zero (there are no jumps of size 0, if 
AX(t) = 0 then ¢ is a point of continuity of X). It is possible to define the 
compensator of u, such that u((0,t] x A) — v((0, t] x A) is a local martingale. 

For the canonical decomposition of semimartingales, similar to (8.79), firstly 
large jumps are taken out, then the small jumps are compensated as follows. 
Consider 


E= Meen = f fp ETEMA de) = TNAX, = MAX.) 


s<t 
where h(x) is a truncation function. This is a sum over “large” jumps with 
|AX(s)| > 1 (since x — h(x) = 0 for |z| < 1). Since the sum of squares 
of jumps is finite, (Corollary 8.31, (8.51)), the above sum has only finitely 


many terms, hence it is finite. Thus the following canonical decomposition of 
a semimartingale is obtained 


X(t) = X0) + A(t) + X°"(E) + (h(x) * (u-v) E) + * p(t), 


= xe ain xem | f oat u-v) a 
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where A is a predictable process of finite variation, X ©” is a continuous martin- 
gale component of X, u is the measure of jumps of X, and v its compensator. 
For a proof see Liptser and Shiryayev (1989), p.188, Shiryaev (1999), p. 663. 
Let C = (X, X°™) (it always exists for continuous processes). The 
following three processes appearing in the canonical decomposition (A, C, v) 
are called the triplet of predictable characteristics of the semimartingale X. 


Notes. Material for this chapter is based on Protter (1992), Rogers and 
Williams (1990), Metivier (1982), Liptser and Shiryayev (1989), Shiryaev 
(1999). 


8.15 Exercises 


Exercise 8.1: Let Tı < 72 be stopping times. Show that [(,,,,,)(t) is a simple 
predictable process. 


Exercise 8.2: Let H(t) be a regular adapted process, not necessarily left- 
continuous. Show that for any ô > 0, H(t — ô) is predictable. 


Exercise 8.3: Show that a continuous process is locally integrable. Show 
that a continuous local martingale is locally square integrable. 


Exercise 8.4: M is a local martingale and E ie H?(s)d[M, M](s) < co. Show 
that ie H(s)dM(s) is a square integrable martingale. 


Exercise 8.5: Find the variance of fo N(t—)dM(t), where M is the compen- 
sated Poisson process M(t) — t. 


Exercise 8.6: If S and T are stopping times, show that 
1. SAT and S VT are stopping times. 
2. The events {S = T}, {S < T} and {S < T} are in Fg. 
3. Fg N{S<T}CFrn{S< TH. 


Exercise 8.7: Let U be a positive random variable on a probability space 
(Q,F,P), and G be a sub-o-field of F. 


1. Let t > 0. Show that F, := {A € F: 3B eG such that AN {U > t} = 
Bo{U > t}} is a ø-field. 


2. Show that F; is a right-continuous filtration on (Q, F, P). 


3. Show that U is a stopping time for F;. 
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4. What are Fo, Fy— and Fy equal to? 


Exercise 8.8: Let U1, U2,... be (strictly) positive random variables on a 
probability space (9, F, P), and G1, Go,... be sub-o-fields of F. Suppose that 
for all n, Uy,U2,...,Un are Gn-measurable and denote by T, the random 
variable >>; Ui. Set Fe = N {4 E F: ABn € Gn such that AN {Tn > t} = 
Brn N {Tn > H} 


1. Show that F; is a right-continuous filtration on (9, F, P). 


2. Show that for all n, Tn is a stopping time for F+. 
3. Suppose that lim, Tn = œ a.s. Show that Fr, = Gn+1 and Fr,- = Gn. 


Exercise 8.9: Let B(t) be a Brownian motion and H(t) be a predictable 
process. Show that M(t) = i H(s)dB(s) is a Brownian motion if and only if 
Leb({t: |H(t)| # 1}) = 0 a.s. 

Exercise 8.10: Let T be a stopping time. Show that the process M(t) = 
2B(t^AT)— B(t) obtained by reflecting B(t) at time T, is a Brownian motion. 


Exercise 8.11: Let B(t) and N(t) be respectively a Brownian motion and a 
Poisson process on the same space. Denote by N (t) = N(t)—t the compensated 


Poisson process. Show that the following processes are martingales: B(t)N (t), 
€(B)(t)N(t) and E(.N)(t) B(t). 

Exercise 8.12: X(t) solves the SDE dX(t) = uX(t)dt + aX (t—)dN(t) + 
oX(t)dB(t). Find the condition for X(t) to be a martingale. 

Exercise 8.13: Find the predictable representation for Y = B°(1). 


Exercise 8.14: Find the predictable representation for the martingale e?“)—*/2, 


Exercise 8.15: Let Y = h sign(B(s))dB(s). Show that Y has a Normal 
distribution. Show that there is no deterministic function H(s), such that 


Y= h H(s)dB(s). This shows that the assumption that Y, B are jointly 
Gaussian in Theorem 8.35 is indispensable. 

Exercise 8.16: Find the quadratic variation of |B(t)]. 

Exercise 8.17: (Stochastic Logarithm) 

Let U be a semimartingale such that U(t) and U(t—) are never zero. Show 
that there exists a unique semimartingale X with X(0) = 0, (X = L(U)) such 


that dX(t) = fe and 


aoa, oe a 


axt 


X(t) =In U(s) |+1- U (s) ji 


U(s—) U(s—) 
(8.82) 


see Kallsen and Shiryaev (2002). 


Chapter 9 


Pure Jump Processes 


In this Chapter we consider pure jump processes, that is, processes that change 
only by jumps. Counting processes and Markov Jump processes are defined 
and their semimartingale representation is given. This representation allows us 
to see the process as a solution to a stochastic equation driven by discontinuous 
martingales. 


9.1 Definitions 


A counting process is determined by a sequence of non-negative random vari- 
ables Tn, satisfying Tn < Th4i if Th < œ and Tn = Tnẹ1 if Tah = œ. Th 
can be considered as the time of the n-th occurrence of an event, and they 
are often referred to as arrival times. N(t) counts the number of events that 
occurred up to time t, that is, 


p= do I(T, < t), N(0) =0. (9.1) 


N(t) is piece-wise constant and has jumps of size one at the points Tn. Such 
processes are also known as simple point processes to distinguish them from 
more general marked point processes, which are described by a sequence (Tn, Zn) 
for some random variables Zn. Zn, for example, may describe the size of jump 
at Th. 

The pure jump process X is defined as follows. 


X(t) Hor I(In < tZ (9.2) 
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Note that X in (9.2) is right-continuous, piece-wise constant with the time of 
the n-th jump at Tn, and Zn = X (Ta) — X(Tn—) = X (Tn) — X(Tn-1) is the 
size of the jump at Tn. 


9.2 Pure Jump Process Filtration 


The filtration F considered here is the natural filtration of the process. We 
recall related definitions. For a process X its natural filtration is defined by 
(the augmentation of) o-fields F; = o(X(s),0 < s < t) and represents the 
information obtained by observing the process on [0,¢]. The strict past is 
the information obtained by observing the process on [0,t) and is denoted by 
Fi- = 0 (X(s),0< 5 <t). 

A non-negative random variable 7, which is allowed to be infinity, is a 
stopping time if {r < t} € F, for every t. Thus by observing the process on 
(0, t] it is possible to deduce whether 7 has occurred. 

Information obtained from observing the process up to a stopping time T is 
F,, defined by F, = {A € F : for any t, AN{r < t} € Fi}. The strict past of 
X at 7 is described by the o-field Fr- = o(AN{t<7}:t>0, AG Fi) V Fo. 
Note that 7 E€ F,- (take A = {T >t} € Fi). Clearly, Fr- C F>. 

Clearly, the arrival times T, are stopping times for F. They are usually 
taken as a localizing sequence. Note that Fr, = o((T;, Zi), i < n) and that 
X(Ta—) = X(Tn-1), since for t satisfying T,-1 < t < Tn, X(t) = X(Tn-1) 
and this value is kept until the next jump at time Tn. As Tn € Fr,_, Fr,- = 
o((Ti, Zi) i < n—1, Tn). Thus Zn, the jump size at Tn, is the only information 
in Fr, not available in Fr, _. 

It can be shown that F; is right-continuous, that is, Fe = F+, as well as 
the following result, which is essential in finding compensators. 


Theorem 9.1 If T is a stopping time, then for each n there is a random 
variable Cn which is Fr, measurable such that 


TIN Th+1 = (Ta + Cn) TAN Taa on {Thn < Th. (9.3) 


Proofs of the above results can be found for example, in Liptser and 
Shiryayev (1974), Karr (1986). 

Since compensators are predictable processes, it is important to have some 
criteria to decide on predictability. By definition, any adapted left continuous 
or continuous process is predictable. The following construction, which is often 
met in calculations, results in a predictable process. 


Theorem 9.2 Let T, be the arrival times in a pure jump process, and for all 
n =0,1,..., Yn(t) be an adapted process such that for any t € (Tn, Tn+1] it 
is Fr, measurable. Then the process X(t) = Xp o Yna (t) (Tn < t < Tn41) is 
predictable. 
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PROOF: We outline the proof. The process I,,(t) = I(T, < t < Tn41) is 
predictable. Indeed, In(t) = I(Tn < t < Tn41) = I(t < Tn41) — I(t < Th). 
Since for each n, Tn is a stopping time, {Tn > t} € F; and therefore In (t) 
is adapted. But it is left-continuous, hence it is predictable. Since Y,,(t) is 
“known” when Tn < t < Tn41, X(t) is predictable. 


Assumptions 


Too = limp Ty exists, since T;,’s are non-decreasing. The results given below 
hold for t < Tæ, and in order to avoid repetitions we assume throughout, that 
Too = œ, unless stated otherwise. In the case of Teo < co there are infinitely 
many jumps on the finite time interval [0, Tæ), and it is said that explosion 
occurs. We assume that there are no explosions. Sufficient conditions for 
Markov Jump processes are given later. 

Assume that the jumps are integrable, E|Z,,| < oo for all n. Under this 
assumption X is locally integrable, since E| X(t \T;,)| < Sci, E|Z;| < œ, and 
therefore it has a uniquely defined compensator A. 

M(t) = X(t) — A(t) is the local martingale associated with X, also called 
the innovation martingale. 


9.3 Itd’s Formula for Processes of Finite Vari- 
ation 


If a semimartingale X is of finite variation, then its continuous martingale part 
X™ = 0, consequently (X, X)° (t) = (x°™, X°™) (t) =0. Therefore the term 
containing the second derivative f” disappears in (8.59). Moreover, since X 
is of finite variation its continuous part X° satisfies dX°(t) = dX(t) — AX(t), 
and Itô’s formula takes form: for any C! function f 


t 
F(X(E))-F(X(0) = | PADAO (FX(9)-F(K(5-)))- (0-4 
0 s<t 
If the continuous part X° is zero, then the formula is an identity, representing 
a function as the sum of its jumps. A similar formula holds for a function of 
n variables. 
Stochastic Exponential 


The stochastic exponential (8.62) of finite variation processes simplifies to 


E(X)(t) = e¥ O-*XO TT (14. AX(s))e7O* = eX TT (1+ AX(s)), (9.5) 


s<t s<t 
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where X°“ is the continuous part of X. The last equality is due to 
X(t) = X(t) — X(0) — X s< AX (8). 


Example 9.1: Let X(t) be a process with jumps of size one (counting process), so 
that AX(s) = 0 or 1. Its stochastic exponential is given by 


E(X)(t) = [[ A+ AX(s)) = 2%O-*, (9.6) 


s<t 


Integration by Parts for Processes of Finite Variation 


The integration by parts formula is obtained directly from the integral repre- 
sentation of the quadratic covariation (8.24) and (1.20). Recall that if X and 
Y are of finite variation, then their quadratic covariation is given by 


[X,Y](t) = X0 AX(s)AY(s) and [X](t) = [X,X]()= S > (AX(s))?. 
O<s<t O0<s<t (9.7) 
Using (8.24) we obtain 
XAY) - Y@=[ x- —)dY(s + fr —)dX(s) + X` AX(s)AY(s). 
0<s<t 
(9.8) 


Remark 9.1: The following formula holds for finite variation processes. 

XC AX(s)AY(s jay AX(s)dY(s (9.9) 
s<t 

Indeed, by letting Y°(t) = Y(t) — }2,<, AY (s) be the continuous part of Y, 


we have 


[axe )dY(s) — X` AX(s)AY(s j= f axi) (s)dY“(s) = 0, (9.10) 


s<t 


since Y° (s) is continuous and AX (s) is different from zero at mostly countably 
many points. 


9.4 Counting Processes 
Let N be a counting process, then it is a pure jump process with jumps of size 


one. Thus 
[N, N](t) = X (AN (8))}? = N(¢). (9.11) 


s<t 
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Theorem 9.3 The compensator A and the sharp bracket process of N are the 
same, A= (N,N). 


PROOF: N is of locally integrable variation, since N(t A Tn) < n and Tn is 
a localizing sequence. A is the unique predictable process such that N(t) — 
A(t) is a local martingale. (N,N) is the unique predictable process such that 
[N, N](t) — (N, N) (t) is a local martingale. The result follows from (9.11) and 
the uniqueness of the compensator. 


Heuristically the compensator for a counting process is given by 
dA(t) = E(dN(t)|Fi_), (9.12) 


where F;—_ denotes the information available prior to time t by observing the 
process over [0,t), and t+ dt > t. dM(t) = d(N — A)(t) = dN (t) — dA(t) is 
that part of dN(t) that cannot be foreseen from the observations of N over 
(0, £). 

The next result shows the relation between the sharp bracket of the martin- 
gale and the compensator in a counting process. Since the proof is a straight 
application of stochastic calculus rules, it is given below. It is useful for calcu- 
lation of the variance of M, indeed by Theorem 8.24 EM?(t) = E (M, M) (t). 


Theorem 9.4 Let M = N — A. Then (M, M) ( = fr (1 — AA(s))dA(s). In 
particular, if A is continuous, then (M, M) (t) = ey 


ProoF: By integration by parts (9.8), we have 
mr) =2 f M(s—)dM(s) + X` (AM(s (9.13) 
s<t 


Use AM(s) = AN(s) — AA(s) and expand the sum to obtain 


S“(AM(s))? = XC AN(s) —2 5° AN(s)AA(s) + X (AA(s 


s<t s<t s<t s<t 


where we used that since N is a simple process (AN(s))? = AN (s). Thus we 
obtain by using formula (9.9) and N = M + A, 


M? (t) 2 [ me- )dM(s)+ Nit -2f AA(s)dN(s )+ f aa) )dA(s 


[ (2m(s-) +1-20.4(5))amiay+ f (1 — AA(s))dA(s). 


0 
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The process in the first integral is predictable, because M(s—) is adapted and 
left-continuous, A(s) is predictable, so that AA(s) is also predictable. There- 
fore the first integral is a local martingale. The second integral is predictable, 
since A is predictable. Thus the above equation is the Doob-Meyer decom- 
position of M?. The result now follows by the uniqueness of the Doob-Meyer 
decomposition. 


We give examples of processes and their compensators next. 


Point Process of a Single Jump 


Let T be a random variable and the process N have a single jump at the 
random time t, that is, N(t) = I(T < t). Let F be the distribution function 
of T. 


Theorem 9.5 The compensator A(t) of N(t) = I(T < t) is given by 
thT 
dF(s) 
A(t) = ——.. 9.14 
=f Ro (9.14) 


PROOF: A(t) is clearly predictable. To show that N(t)— A(t) is a martingale, 
by Theorem 7.17 it is enough to show that EN(S) = EA(S) for any stopping 
time S. By Theorem 9.1 there exists an Fo-measurable (i.e. almost surely 
constant) random variable ¢ such that 


{S> T}={SAT=T}={ÇAT=T}={T <6} 


Therefore 
EN(S) = P(S>T)=P(T <¢) 
¢ ¢ 
= | ar@=f ae dF (t) 
€ I(T >t) B SAT dF(t) 
zi (/ a Cafar) =E (/ 0 mo) 


E SAT dF (t) 
-=(/ sy 


Thus (9.14) is obtained. 


If F has a density f, then A(t) = Pia h(s)ds, where 


f@ 
=a (9.15) 
is called the hazard function and it gives the likelihood of the occurrence of 
the jump at t given that the jump has not occurred before t. 
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Compensators of Counting Processes 


The next result gives an explicit form of the compensator of a general counting 
process. Since the proof uses the same ideas as above, it is omitted. 


Theorem 9.6 Let N be a counting process generated by the sequence Tn. De- 
note by Un4i = Tn41 — Tn the inter-arrival times, To = 0. Let F(t) = 
P(Un+1 <t|T1,...,Tn) denote the regular conditional distributions, and 
Fo(t) = P(T: < t). Then the compensator A(t) is given by 


a foe) taTi41—-taT; dF;(s) 
A(t) = a (FG. (9.16) 


Note that if the conditional distributions F,, in the above theorem are contin- 
uous with F,,(0) = 0, then by changing variables we have 


© dFals) f° dFn(s) EN 
f 1 — F,,(s—) -f 1— F,,(s) = log(1 Fa( )), (9.17) 


and equation (9.16) can be simplified accordingly. 


Renewal Process 


A renewal process N is a point process in which all inter-arrival times are inde- 
pendent and identically distributed, that is, T1, Tə — T1, . .. Tn+1 — Tn are i.i.d. 
with distribution function F(x). In this case all the conditional distributions 
Fn in the previous theorem are given by F. As a result of Theorem 9.6 we 
obtain the following corollary. 


Corollary 9.7 Assume that the inter-arrival distribution is continuous and 
F(0) =0. Then the compensator of the renewal process is given by 


-Y iog (1 — F(t A Tp — tA Tn-1)). (9.18) 
=1 
Stochastic Intensity 
Definition 9.8 If A(t) = fJ Al s)ds, where A(t) is a positive predictable pro- 


cess, then A(t) is called the aa intensity of N. 


that if A(t) is deterministic and differentiable with derivative A’ (t), and 
=f A'(s)ds, then A(t) = A’(t) is predictable and is the stochastic in- 

i If We : is random and differentiable with o A' (t), satisfying 
=A, A'(s)ds, then A(t) = A’(t—). Indeed, A(t Sia A'(s—)ds. 
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T the one intensity exists, then by definition of the compensator, 
-= JAC s)ds is a local martingale. For counting processes a heuristic 
nE F the intensity is given by 


\i)dt = dA(t) = E(N (8)|Fi-) = P(N (8) =F). (9-19) 


If the stochastic intensity exists, then the compensator is continuous and by 
Theorem 9.4 the sharp bracket of the martingale M = N — A is given by 


n= A(s)ds. (9.20) 


1. A deterministic point process is its own compensator, so it does not have 
stochastic intensity. 


Example 9.2: 


2. Stochastic intensity for the renewal process with continuous inter-arrival dis- 
tribution F is given by h(V(t—)), where h is the hazard function and V(t) = 
t — Ty t), called the age process. This can be seen by differentiating the com- 
pensator in (9.18). 


3. Stochastic intensity for the renewal process with a discrete inter-arrival distri- 
bution F does not exist. 


Non-homogeneous Poisson Processes 


Theorem 9.9 Let N(t) be point process with a continuous deterministic com- 
pensator A(t). Then it has independent Poisson distributed increments, that 
is, the distribution of N(t) — N(s) is Poisson with parameter A(t) — A(s), 
O0<s<t. 


If A(t) has a density A(t), that is, A(t = fiA s)ds, then N(t) is called the 
non-homogeneous Poisson process a rate A(s). 

PROOF: We prove the result by an application of the stochastic exponential. 
M(t) = N(t) — A(t) is a martingale. For a fixed 0 < u < 1, —uM (t) is also a 
martingale. Consider €(uM)(t). By (9.5) 


E(-uM)(t) = e"4©]](1 -uAM(s)) 
s<t 
eva) jja = uAN(s)) = emia = uN 
s<t 
= "AWEN log(1—u) | (9.21) 


where we have used that AN(s) is zero or one. Stochastic exponential of 
a martingale is always a local martingale, but in this case it is also a true 
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martingale by Theorem 7.21. Indeed, since A(t) is deterministic and non- 
decreasing, 


Esup ev AON C) log(1—u) < e”4lT)E sup eN® log(1—w) < etA(T) < 00, 
t<T t<T 


and the condition of Theorem 7.21 is satisfied. Taking expectations in (9.21) 
we obtain the moment generating function of N(t), 


E(( = u) ®) =e 40 
or with v=1-— u, 
E(w ®) = e0740, (9.22) 


This shows that N (t) is a Poisson random variable with parameter A(t). If 
we take conditional expectation in (9.21) and use the martingale property, we 
obtain in the same way that for all s < t 


E(vN-N()|Z,) = efl-v(A®—Als)) (9.23) 


which shows that the distribution of N(t)— N(s) does not depend on the past 
and is Poisson. 


A similar result holds if the compensator is deterministic but discontinuous. 
The proof is more involved and can be found for example, in Liptser and 
Shiryayev (1974) p. 279, where the form of the distribution of the increments 
is also given. 


Theorem 9.10 Let N(t) be a point process with a deterministic compensator 
A(t). Then it has independent increments. 


The following result states that a point process with a continuous, but 
possibly random, compensator can be transformed into Poisson process by a 
random change of time. Compare this result to change of time for continuous 
martingales, Dambis, Dubins-Schwarz Theorem 7.37. 


Theorem 9.11 Let a counting process N(t) have a continuous compensator 
A(t) and limp.» A(t) = œ. Define p(t) = inf{s > 0: A(s) = t}. Let 
K(t) = N(p(t)) and Gt = Foa- Then the process K (t) with respect to filtration 
Gi is Poisson with rate 1. 


The proof can be found in Liptser and Shiryayev (1974) p 280, (2001) and 
is not given here. To convince ourselves of the result, consider the case when 
A(t) is strictly increasing, then p(t) is the usual inverse, that is A(p(t)) = t. 
Then EK (t) = EN(p(t)) = EA(p(t)) = t, so that K has the right mean. 

For more information on point processes see for example, Liptser and 
Shiryayev (2001), (1989) and Karr (1986). 
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Compensators of Pure Jump Processes 


Let for all t > 0 


X(t) = X(0)+ X ZI (Ta < t), (9.24) 
n=1 
be a pure jump point process generated by the sequence (Tn, Zn). By using 
the same arguments as in the proof of Theorem 9.6 we can show the following 
result. 


Theorem 9.12 Let F(t) = P(In4i — Tn < t|Fr,) denote the regular con- 
ditional distributions of inter-arrival times, Fo(t) = P(T, < t), and mn = 
E(Zn41\Fr, ) = E(X(Tn4i) — X(Tn)|Fr,) denote the conditional expectation 
of the jump sizes. Then the compensator A(t) is given by 


oo t^Tnpi tin” AB (3) 
A(t) = L TR (es): re 


The following observation is frequently used in the calculus of pure jump 
processes. 


Theorem 9.13 If X(t) is a pure jump process, then for a function f, f(X(t)) 
is also a pure jump process with the same jump times Tn. The size of the jump 
at Tn is given by Zi, = f(X(Tn)) — f(X(Tn—)). Consequently if f is such that 
E|Z!,| < œ, then the compensator of f(X) is given by (9.25) with Mn replaced 
by mi, = E(Z!,41|Fr,). 


Theorem 9.14 Let X(t) be a pure jump point process generated by the se- 
quence (Tn, Zn). Assume conditions and notations of Theorem 9.12 and that 
the compensator A(t) is continuous. Suppose in addition that EZ? < œ and 
Un = E(Z2,,|Fr,). Let M(t) = X(t) — A(t), then (M, M) (t) is given by 
(9.25) with my replaced by Un. 


PROOF: Since A is assumed to be continuous, (and it is always of finite 
variation) 


[M, M](t) = [X — A,X — A](t) = [X,X](Q)= J (AX(s))?. (9.26) 
O<s<t 
Thus [M, M] is a pure jump process with jump times JT, and jump sizes 
(AX(T,,))? = Z2. Thus [M, M](t) is a pure jump process generated by the se- 
quence (T;,,Z2). (M, M) (t) is its compensator. The result follows by Theorem 
9.12. 


A particular case of pure jump processes is the class of processes with 
exponentially distributed inter-arrival times and independent jump sizes. This 
is the class of Markov Jump processes. 
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9.5 Markov Jump Processes 


Definitions 
Let for all t > 0 

X(t) = X(0)+ X ZI (Ta < t), (9.27) 

n=1 

where Tn and Zn have the following conditional distributions. Given that 
X (Tn) = £, Tn41 — Tn is exponentially distributed with mean \~!(a), and in- 
dependent of the past; and the jump Zn+1 = X(Tn41) —X (Tn) is independent 
of the past and has a distribution that depends only on z. 


Falt) = P(Tn4i —Ta < t|Fr,) = 1 eE OE, (9.28) 


and for some family of distribution functions K (2, -) 


P(X (In41) — X (Tn) < t|Fr,) K(X (Ta), t), 
E(X (Tai) — X(T )|Fr,) = m(X(Tp)) = Mp. (9.29) 


It is intuitively clear that X defined in this way possesses the Markov 
property, due to the lack of memory of the exponential distribution. We omit 
the proof. 

Heuristically, Markov Jump processes can be described as follows. If the 
process is in state x then it stays there for an exponential length of time with 
mean \~1(x) (parameter A(x)) after which it jumps from z to a new state 
x+ ¿(lx), where P(¿(x) < t) = K(a,t) denotes the distribution of the jump 
from z. The parameters of the process are: the function A(x) (the holding 
time parameter) and distributions K(x, -) (distributions of jump sizes). 

A(x) is always non-negative. If for some x, A(x) = 0 then once the process 
gets into x it stays in x forever, in this case the state x is called absorbing. We 
shall assume that there are no absorbing states. If A(x) = co then the process 
leaves x instantaneously. We assume that A(x) is finite on finite intervals, so 
that there are no instantaneous states. 

For construction, classification and properties of Markov Jump processes 
see for example, Breiman (1968), Chung (1967), Ethier and Kurtz (1986). 


The Compensator and the Martingale 


We derive the compensator heuristically first and then give the precise result. 
Suppose that X(t) = x. By the lack of memory property of the exponential 
distribution, it does not matter how long the process has already spent in 2, 
the jump from zg will still occur after exponentially distributed (with parameter 
A(x)) random time. Therefore the conditional probability that a jump occurs 
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in (t, t+dt), given that it has not occurred before, is (with U having exponential 
exp(A(x)) distribution) 


P(U < dt) = 1 — exp(—A(x)dt) ~ A(x) dt. 
When the jump occurs its size is ¿(x), with mean m(x) = E(£(x)). Therefore 
dA(t) = E (dX (t)|Fi-) = A(X (t))m(X (t))dt. (9.30) 


Assume that limp—>o Tn = Ts. = œ (such processes are called regular, and 
sufficient conditions for this are given in a later section). If Teo < oo then the 
compensator is given for times t < Ta. 


Theorem 9.15 Let X be a Markov jump process such that for all x, the hold- 
ing time parameter is positive, A(x) > 0, and the size of the jump from x is 
integrable with mean m(x). 


1. The compensator of X is given by 
A(t) =} A(X (s))m(X(s))ds. (9.31) 
0 


2. Suppose the second moments of the jumps are finite, v(x) = EE? (x 
Then the sharp bracket of the local martingale M(t) = 
given by 


(M, M) (t) = f \MX(s))v(X(s))ds. (9.32) 


PROOF: The formula (9.31) follows from (9.25). Indeed, using the exponential 
form of Fn (9.28), we have 


A(t) = 2 m(X (TAX (TI) A Tiga — tA Ti). (9.33) 


Note that since the process X has a constant value X(T;) on the time interval 
(T;,Ti+1), we have for a function f 


[ PEs =E EATE Ba -EAT (9.34) 
i=0 


and taking f(x) = A(x)m(x) gives the result. The formula (9.32) follows from 
Theorem 9.14. 
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9.6 Stochastic Equation for Jump Processes 


It is clear from Theorem 9.15 that a Markov Jump process X has a semi- 
martingale representation 


X(t) = X(0) + A(t) + M(t) = X(0) + i A(X (s))m(X(s))ds + M(t). (9.35) 
0 
This is the integral form of the stochastic differential equation for X 
dX (t) = A(X (t))m(X (t))dt + dM(t), (9.36) 


and is driven by the purely discontinuous, finite variation martingale M. By 
analogy with diffusions the infinitesimal mean is A(x)m(x) and the infinitesimal 
variance is \(x)v(z). 

It is useful to have conditions assuring that the local martingale M in the 
representation (9.35) is a martingale. Such conditions are given in the next 
result. 


Theorem 9.16 Suppose that for all x 
Aa)E(\E(2)|) < CO + |e), (9.37) 
then representation (9.35) holds with a zero mean martingale M. If in addition 
Aa)o(x) < C(14 2”), (9.38) 


then M is square integrable with 


= f Mx (s)o(X(s))as. (9.39) 


In particular, we have the following corollary. 


Corollary 9.17 Suppose that the conditions of the above Theorem 9.16 hold. 
Then 


EX(t) = EX(0 = s))m(X(s))ds, (9.40) 


Var(M(t) — =e f x A(X X(s))ds. (9.41) 


The proof of the result can be found in Hamza and Klebaner (1995). 
An application of the stochastic equation approach to the model of Birth- 
Death processes is given in Chapter 13. 
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Remark 9.2: Markov Jump processes with countably or finitely many states 
are called Markov Chains. The jump variables have discrete distributions 
in a Markov Chain, whereas they can have a continuous distribution in a 
Markov process. Markov Jump processes are also known as Markov Chains 
with general state space. 


Remark 9.3: A model for a randomly evolving population can be served by a 
Markov Chain on non-negative integers. In this case the states of the process 
are the possible values for the population size. The traditional way of defining 
a Markov Chain is by the infinitesimal probabilities: for integer i and j and 
small 6 
P(X (t+ 6) = j|X(t) = i) = Ad + o(ô), 

where lims_,9 0(5)/d = 0. In this section we presented an almost sure, path 
by path representation of a Markov Jump process. Another representation 
related to the Poisson process can be found in Ethier and Kurtz (1986). 


Generators and Dynkin’s Formula 


Let X be the Markov Jump process described by (9.27). Suppose that A(x) is 
bounded, that is sup, A(x) < oo. Define the following linear operator L acting 
on bounded functions by 


Lf (x) = Ax)E(f(« + &(2)) — f(x) = A(e)ms (a). (9.42) 

Theorem 9.18 (Dynkin’s Formula) Let L be as above, and define M/(t) 
by 

M(t) = FXO- F(X(0)) ~ | LEXO) (9.43) 


Then Mf is a martingale, and consequently 


Ef(X(t)) = f(X(0)) +E | Lf(X(s))ds. (9.44) 


ProoFr: The proof of these formulae follows from Theorem 9.13 and Theo- 
rem 9.15. The process f(X(t)) is a pure jump process with the same arrival 
times T),, and jumps of size Z}, = f(X(Tn)) — f(X(Tn—)) (in this case it is 
also Markov). The jumps are bounded, since f is bounded. The mean of 
the jump from z is given by E(Z/,,|Fr,,X(In) = £) = E(f(x + &(x)) — 
f(x)) = my(x). Therefore the compensator of f(X(t)) is, from Theorem 9.15, 
So A(X(s))my(X(s))ds. This implies that Mf is a local martingale. If A(x) is 
bounded, then M(t) is bounded by a constant Ct on any finite time interval 
[0, t]. Since a bounded local martingale is a martingale, Mf is a martingale. 
Moreover it is square integrable on any finite time interval. 
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The linear operator L in (9.42) is called the generator of X. It can be 
shown that with its help one can define probabilities on the space of of right- 
continuous functions, so that the coordinate process is the Markov process, cf. 
solution to the martingale problem for diffusions. 

Theorem 9.16 above can be seen as a generalization of Dynkin’s formula for 
the identity function f(x) = x and the quadratic function f(x) = x?. These 


are particular cases of a more general theorem 


Theorem 9.19 Assume that there are no explosions, that is, Tn | oo. Sup- 
pose that for an unbounded function f there is a constant C such that for all 
x 


A(@)E| f(x + &(a)) — f()| < CA + |f(@))). (9.45) 


Suppose also that E|f(X(0))| < co. Then for all t, O < t < œ, E|f(X(t))| < 
œo, moreover M? in (9.43) is a martingale and (9.44) holds. 


For the proof see Hamza and Klebaner (1995). 


9.7 Explosions in Markov Jump Processes 


Let X(t) be a jump Markov process taking values in R. If the process is 
in states x then it stays there for an exponential length of time with mean 
A~*(x) after which it jumps from zx. If A(x) — oo for a set of values x, then 
the expected duration of stay in state x, \~!(x) — 0, and the time spent in x 
becomes shorter and shorter. It can happen that the process jumps infinitely 
many times in a finite time interval, that is, lim,p..Tn = Tx < œ. This 
phenomenon is called explosion. The terminology comes from the case when 
the process takes only integer values and A(x) can tend to infinity only when 
x — oo. In this case there are infinitely many jumps only if the process reaches 
infinity in finite time. 

When the process does not explode it is called regular. In other words, the 
regularity assumption is 


Tso = lim Tp, = œ. a.s. (9.46) 


If P(Ts < co) > 0, then it said that the process explodes on the set {Tso < co}. 
A necessary and sufficient condition for non-explosion is given by 


Theorem 9.20 


So 1 
2 AT = œ. 4.8. (9.47) 
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PROOF: We sketch the proof, see Breiman (1968) for details. Let vp, be 
a sequence of independent exponentially distributed random variables with 
parameter An. Then it is easy to see (by taking a Laplace transform) that 
Xo vi < œ converges in distribution if and only if Xg + < oo. Using the 
result that a series of independent random variables converges almost surely if 
and only if it converges in distribution, we have $50 Vn < co almost surely if 
and only if Jo > < oo. Condition (9.47) now follows, since the conditional 
distribution of T,41 — Tn, given the information up to the n-th jump, Fr,, is 
exponential with parameter A(X (T;,)). 


Condition (9.47) is hard to check in general, since it involves the vari- 
ables X(T;,). A simple condition, which is essentially in terms of the function 
A(x)m(x) (the drift), is given below. 


Theorem 9.21 Assume that X > 0, and there exists a monotone function 
f(x) such that X(x)m(x) < f(x) and f O =œ. Then the process X does 


not explode, that is, P(X(t) < œ for all t, 0 < t< œ) =1. 


According to this, the first condition of the result on integrability, Theorem 
9.16, guarantees that the process does not explode. Proof of Theorem 9.21 
and other sharp sufficient conditions for regularity in terms of the parameters 
of the process are given in Kersting and Klebaner (1995), (1996). 


Notes. Material for this chapter is based on Liptser and Shiryayev (1974), 
(1989), Karr (1986), and the research papers quoted in the text. 
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9.8 Exercises 


Exercise 9.1: Let U > 0 and denote h(x) = fj AEL, where F(x) is the 


distribution function of U. Show that for a > 0 EhA(U Aa) = fe dF (x). 


Exercise 9.2: Show that when the distribution of inter-arrival times is expo- 
nential the formula (9.18) gives the compensator At, hence N(t) is a Poisson 
process. 


Exercise 9.3: Show that when the distribution of inter-arrival times is Geo- 
metric then N(t) has a Binomial distribution. 


Exercise 9.4: Prove (9.25). 
Exercise 9.5: Let X(t) be a pure jump process with first k moments. Let 


mx (x) = E€*(x) and assume for all i =1,...,k, \(x)E|E(x)|* < Cla|*. Show 
that 


k-1 t l 
EX*(t) = EX*(0) + D ) | B(A(X(s))X"(s)mx—a(X(s))) ds 
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Chapter 10 


Change of Probability 
Measure 


In this chapter we describe what happens to random variables and processes 
when the original probability measure is changed to an equivalent one. Change 
of measure for processes is done by using Girsanov’s theorem. 


10.1 Change of Measure for Random Variables 


Change of Measure on a Discrete Probability Space 


We start with a simple example. Let Q = {w1,w2} with probability measure 
P given by p(w1) = p, p(w2) = 1 — p. 


Definition 10.1 Q is equivalent to P (Q ~ P) if they have same null sets, i 
e. Q(A) = 0 if and only if P(A) =0. 


Let Q be a new probability measure equivalent to P. In this example, this 
means that Q(w;) > 0 and Q(we) > 0 (or 0 < Q(wi) < 1). Put Q(w1) = q, 
O0<q<l. 
Let now A(w) = Be, that is, A(w1) = BEY = 4, and A(we) = 424. 
By definition of A, for all w 


Q(w) = A(w)P(w). (10.1) 


Let now X be arandom variable. The expectation of X under the probability 
P is given by 


Ep(X) = X(w1)P(w1) + X (w2)P (w2) = pX (wi) + (1 — p)X (w2) 
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and under the probability Q 
EQ(X) = X(41)Q(a1) + X (w2)Q(wa). 
From (10.1) 
EQ(X) = X(w1)A(wi1)P(w1) + X (we) A(w2)P(w2) = Ep(AX). (10.2) 
Take X = 1, then 
Eg(X) =1=Ep(A). (10.3) 


On the other hand, take any random variable A > 0, such that Ep(A) = 1, 
and define Q by (10.1). 
Then Q is a probability, because Q(w;) = A(w;)P(w;) > 0, i = 1,2, and 


QQ) = Qw) + Q(w2) = A(w1)P(w1) + A(w2)P (wa) 
= Ep(A)=1. 


Q is equivalent to P, since A is strictly positive. 

Thus we have shown that for any equivalent change of measure Q ~ P there 
is a positive random variable A, such that Ep(A) = 1, and Q(w) = A(w)P(w). 
This is the simplest version of the general result, the Radon-Nikodym Theorem. 
The expectation under Q of a random variable X is given by, 


Eo(X) = Ep(AX). (10.4) 
By taking indicators I(X € A) we obtain the distribution of X under Q 
Q(X € A) = Ep (AI (X € A)). 


These formulae, obtained here for a simple example, hold also in general. 


Change of Measure for Normal Random Variables 


Consider first a change of a Normal probability measure on R. Let u be any 
real number, f (x) denote the probability density of N (p, 1) distribution, and 
P,, the N(u, 1) probability measure on R (B(R)). Then, it is easy to see that 


1 


fula) = gE = foo, (10.5) 
TE 
Put i 
A(x) = etr’, (10.6) 


Then the above equation reads 


Fula) = fo(x)A(z). (10.7) 
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By the definition of a density function, the probability of a set A on the 
line, is the integral of the density over this set, 


P(A) = r) f(x)dx = / dP. (10.8) 
A A 
In infinitesimal notations this relation is written as 
dP = P(dx) = f(x)dx. (10.9) 


Hence the relation between the densities (10.7) can be written as a relation 
between the corresponding probability measures 


fu(a)dx = fo(x)A(a)dx, and P,,(dx) = A(x)Po(dz). (10.10) 


By a property of the expectation (integral) (2.3), if a random variable X > 0 
then EX = 0 if and only if P(X =0) =1, 


P,,(A) = d A(a)Po(dz) = Ep, (IaA) = 0 


implies 
Po(l44 =0)=1. 


Since A(x) > 0 for all x, this implies Po(A) = 0. Using that A < oo, the other 
direction follows: if Po(A) = 0 then P,,(A) = Ep (4A) = 0, which proves 
that these measures have same null sets and are equivalent. 

Another notation for A is 


dP, dP, 


Eor a (10.11) 


This shows that any N (u, 1) probability is obtained by an equivalent change 
of probability measure from the N(0, 1) distribution. 

We now give the same result in terms of changing the distribution of a 
random variable. 


Theorem 10.2 Let X have N(0,1) under P, and A(X) = e#*-¥"/2. Define 
measure Q by 
dQ 


QA) = | ACOP = Ep AMX) or E(X) = A(X). (10.12) 


Then Q is an equivalent probability measure, and X has N(u,1) distribution 
under Q. 
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PROOF: We show first that Q defined by (10.12) is indeed a probabil- 
ity. Q(Q) = Eg(1) = Ep(1A(X)) = Ep(e#*—-#°/2) = e-#’/2Ep(e#*X) = 1. 
The last equality is because Ep(e!*) = e#”/? for the N(0,1) random vari- 
able. Other properties of probability Q follow from the corresponding prop- 
erties of the integral. Consider the moment generating function of X under 
Q. Eg(e"*) = Ep(e"*A(X)) = Ep(et)*-#°/2) = eH PE (eet) X) = 
ewtu"/2 which corresponds to the N(j1, 1) distribution. 


Remark 10.1: If P is a probability such that X is N(0,1), then X + u has 
N(w, 1) distribution under the same P. This is an operation on the outcomes z. 
When we change the probability measure P to Q = P,,, we leave the outcomes 
as they are, but assign different probabilities to them (more precisely to sets 
of outcomes). Under the new measure the same X has N(y,1) distribution. 


Example 10.1: (Simulations of rare events) 

Change of probability measure is useful in simulation and estimation of probabil- 
ities of rare events. Consider estimation of Pr(N(6,1) < 0) by simulations. This 
probability is about 10~'°. Direct simulation is done by the expression 


1 n 
Pr(N(6, 1 x> I(xi ; 10.1 
r(N(6, 1) < 0) eee) (10.13) 
where x;’s are the observed values from the N(6,1) distribution. Note that in a 


million runs, n = 10°, we should expect no values below zero, and the estimate is 0. 
Let Q be the N(6, 1) distribution on R. Consider Q as changed from N(0,1) = P, 


as follows dQ dN(6.1) 

2a ASA Za — bae—n?/2 _ „6s—18 
7p (*) = IND) Oe e : 
So we have 


Q(A) = N(6, 1)(A) = Ep (A(x)I(A)) = Eno, (e®7I(A)). 


In our case A = (—oo,0), and Pr(N (6,1) < 0) = Q(A). 
Thus we are led to the following estimate 


n n 


1 et —12 1 Gite: 
Pr(N(6,1) < 0) = Q(A) = — Yo ea; <0) =e P— SPO Ie <0), 
i=l 4=1 


(10.14) 
with x; generated from N(0, 1) distribution. Of course, about half of the observations 
x;’s will be negative, resulting in a more precise estimate, even for a small number 
of runs n. 


Next we give the result for random variables, set up in a way similar to Gir- 
sanov’s theorem for processes, changing measure to remove the mean. 
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Theorem 10.3 (Removal of the mean) Let X have N(0,1) distribution 
under P, and Y = X + u. Then there is an equivalent probability Q~P, 
such that Y has N(0,1) under Q. (dQ/dP)(X) = A(X) = eX =r /2, 


PROOF: Similarly to the previous proof, Q is a probability measure, with 
the same null sets as P. 


Eg(e"”) = Ep(e OTM A(X)) = Ep(et WX tuun?) — ett aH E (et W)X), 


Using the moment generating function of N(0,1), Ep(e—)*) = e(u-n)?/2, 
which gives Eg(e"”) = e/2. establishing Y ~ N(0,1) under Q. 


Finally, we give the change of one Normal probability to another. 
Theorem 10.4 Let X have N(u1,07) distribution, call it P. Define Q by 


X-u)? — (X=)? 
cists Pes 


Gl Ge 205 (10.15) 


(dQ/dP)(X) = A(X) = 


02 
Then X has N(u2,02) distribution under Q. 


Proor: The form of A is easily verified as the ratio of Normal densities. 
This proves the statement for P and Q on R. On a general space, the proof 
is similar to the above, by working out the moment generating function of X 
under Q. 


10.2 Change of Measure on a General Space 


Let two probability measures P and Q be defined on the same space. 


Definition 10.5 Q is called absolutely continuous with respect to P, written 
as Q <P, if Q(A) = 0 whenever P(A) = 0. P and Q are called equivalent if 
Q<«<PandP«Q. 


Theorem 10.6 (Radon-Nikodym) If Q « P, then there exists a random 
variable A, such that A> 0, EpA =1, and 


Q(A) =Ep (ANA) = f sap (10.16) 


for any measurable set A. A is P-almost surely unique. Conversely, if there ex- 
ists a random variable A with the above properties and Q is defined by (10.16), 
then it is a probability measure and Q & P. 
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The random variable A in the above theorem is called the Radon-Nikodym 
derivative or the density of Q with respect to P, and is denoted by dQ/dP. 
It follows from (10.16) that if Q < P, then expectations under P and Q are 
related by 

EgX = Ep (AX), (10.17) 


for any random variable X integrable with respect to Q. 
Calculations of expectations are sometimes made easier by using a change 
of measure. 


Example 10.2: (A Lognormal calculation for a financial option) E(e*1(X > a)), 
x 


where X has N(p,1) distribution (P). Take A(X) = e*/Ee* = e Sia, and 
dQ = A(X)dP. Then EA(X) = 1, 0 < A(X) < œ, so Q ~P. 

Ep(eXI(X >a)) = e“+t3Ep(A(X)I(X > a)) 
EQ (I(X > a)) = Q(X > a). 


Eg(e”¥) 2s Ep(e"* A(X)) = He (eer ee) = evu tutu?/2 


which is the transform of N (u + 1,1) distribution. Thus the Q distribution of X is 
N(u+1,1), and E(e* I(X > a)) = Q(X > a) = Pr(N(u+1,1) > a) = 1-6(a—p-1). 


We give the definition of the “opposite” concept to the absolute continuity 
of two measures. 


Definition 10.7 Two probability measures P and Q defined on the same space 
are called singular if there exist a set A such that P(A) =0 and Q(A) = 1. 


Singularity means that by observing an outcome we can decide with certainty 
on the probability model. 


Example 10.3: 1. Let Q = R*, P is the exponential distribution with parameter 
1, and Q is the Poisson distribution with parameter 1. Then P and Q are singular. 
Indeed the set of non-negative integers has Q probability 1, and P probability 0. 

2. We shall see later that the probability measures induced by processes o B(t), 
where B is a Brownian motion, are singular for different values of o. 


Expectations for an absolutely continuous change of probability measure are 
related by the formula (10.17). Conditional expectations are related by an 
equation, similar to Bayes formula. 


Theorem 10.8 (General Bayes formula) Let G be a sub-o-field of F on 
which two probability measures Q and P are defined. If Q « P with dQ = AdP 
and X is Q-integrable, then AX is P-integrable and Q-a.s. 


Ep(XA|G) 


Eas EBay 


(10.18) 
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PROOF: We check the definition of conditional expectation given G, (2.15) 
for any bounded G-measurable random variable € 


Eo (EX) = Ea(fEQ(4|9)). 
Clearly the rhs of (10.18) is G-measurable. 
Eo Gane? as (seo) 


Ep(Al9) Ep(A|G) 


ie (rran Pe) 


= Ep (Ep(XAI9)€) 
Ep(XA€) = Eg(X&). 


II 


In the rest of this section we address the general case of two measures, not 
necessarily equivalent. Let P and Q be two measures on the same probability 
space. Consider the measure v = ee Then, clearly, P< v and Q & v. 
Therefore, by the Radon-Nikodym Theorem, there exist A, and Ag, such that 
dP/dv = A, and dQ/dv = Ag. Introduce the notation a) = 1/x, if « 4 0, 
and 0 otherwise, so that r™)x = I(x 4 0). We have 


Q(A) = [dew = | MAP -APA 


= | AAMAdy+ f (1 — AP Ay) Adv 
A 


= | AjAtaP +f -APAQ 


A 
= Q*(A)+ Q*(A), 
where fae = fi AoA; (HAP is absolutely continuous with respect to P, and 
Q*(A) = Hi (-— AMA A,)dQ. The set A = {A, > 0} has P probability one, 


A)= f Mav = | aav=P@)=1 


but has measure zero under Q*, because on this set AMA, = l and Q*(A) = 0. 
This shows that QÔ is singular with respect to P. Thus we have 


Theorem 10.9 (Lebesgue Decomposition) Let P and Q be two measures 
on the same probability space. Then Q(A) = Q°(A) + Q*(A), where Q° <P 
and Q° LP. 
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10.3 Change of Measure for Processes 


Since realizations of a Brownian motion are continuous functions, the probabil- 
ity space is taken to be the set of continuous functions on [0, T], Q = C([0, T]). 
Because we have to describe the collection of sets to which the probability is 
assigned, the concepts of an open (closed) set is needed. These are defined in 
the usual way with the help of the the distance between two functions taken as 
sup;<r |wWi(t)— we(t)|. The o-field is the one generated by open sets. Probabil- 
ity measures are defined on the measurable subsets of Q. If w = wyo,T] denotes 
a continuous function, then there is a probability measure P on this space such 
that the “coordinate” process B(t,w) = B(t, wio,7}) = w(t) is a Brownian mo- 
tion, P is the Wiener measure, see Section 5.7. Note that although a Brownian 
motion can be defined on [0, 00), the equivalent change of measure is defined 
only on finite intervals [0,7]. A random variable on this space is a function 
X: Q > R, and X(w) = X(wp,7}), also known as a functional of Brown- 
ian motion. Since probabilities can be obtained as expectation of indicators, 
P(A) = Ely, it is important to know how to calculate expectations. E(X) is 
given as an integral with respect to the Wiener measure 


E(X) = f Xona. (10.19) 


In particular, if X(wo,r)) = h(w(T)), for some function of real argument h, 
then 


E(X) = I h(w(T))dP = E(h(B(T)), (10.20) 


which can be evaluated by using the N (0, T) distribution of B(T). Similarly, 
if X a function of finitely many values of w, X(wjo,r)) = h(w(t1),..-,w(tn)), 
then the expectation can be calculated by using the multivariate Normal dis- 
tribution of B(t,),..., B(t,). However, for a functional which depends on the 
whole path wio,r), such as integrals of w(t) or max;<7 w(t), the distribution of 
the functional is required (10.21). It can be calculated by using the distribution 
of X, Fx, as for any random variable 


BX) = f X(uon)aP = f edFx(a. (10.21) 


If P and Q are equivalent, that is, they have same null sets, then there exists 
a random variable A (the Radon-Nikodym derivative A = dQ/dP), such that 
the probabilities under Q are given by Q(A) = f ,AdP. Girsanov’s theorem 
gives the form of A. 

First we state a general result, that follows directly from Theorem 10.8, 
for the calculation of expectations and conditional expectations under an ab- 
solutely continuous change of measure. It is also known as the general Bayes 
formula. 
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Theorem 10.10 Let A(t) be a positive P-martingale such aa nape =i; 
Define the new probability measure Q by the relation Q(A = fa A( T)dP, (so 
that dQ/dP = A(T)). Then Q is absolutely continuous a respect P and 
for a Q-integrable random variable X 


Eg(X) = Ep(A(T)X). (10.22) 
EQ(X|Ft) = Ep (SO x15) l (10.23) 

and if X is Fe measurable, then for s < t 
Eg(X|Fs) = Ep (Fox) (10.24) 


Proor: It remains to show (10.24). Eg(X|Fs) = Ep (XIF) = 
Ep (Ep(AQXFDIF. ) =Ep (X KEP (AT)IF:)IF:) =Ep (32 XIF) ; 


The following result follows immediately from (10.24). 

Corollary 10.11 A process M(t) is a Q-martingale if and only if A(t)M (t) 
is a P-martingale. 

By taking M(t) = 1 we obtain a result is used in financial applications. 
Theorem 10.12 Let A(t) be a positive P-martingale such that Ep(A(T))=1, 
and dQ/dP = A(T). Then 1/A(t) is a Q-martingale. 


We show next that convergence in probability is preserved under an absolutely 
continuous change of measure. 


Theorem 10.13 Let Xn —> X in probability P and Q & P. Then Xn > X 
in probability Q. 


PROOF: Denote An = {|Xn — X| > e}. Then convergence in probability of 
Xn to X means P(A,,) > 0. But Q(An) = Ep(AI4, ). Since A is P-integrable, 
the result follows by dominated convergence. 


Corollary 10.14 The quadratic variation of a process does not change under 
an absolutely continuous change of the probability measure. 


PROOF: The sums >>? 9(X(tiz1) — X(ti))? approximating the quadratic 
variation converge in P probability to [X, X](t). By the above result they 
converge to the same limit under an equivalent to P probability Q. 
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Theorem 10.15 (Girsanov’s theorem for Brownian motion) Let B(t), 
O0<t<T, be a Brownian motion under probability measure P. Consider the 
process W(t) = B(t) + ut. Define the measure Q by 


dQ 


A= — 
dP 


(Bpo,r]) = e42- T, (10.25) 
where Bio r) denotes a path of Brownian motion on [0,T]. Then Q is equivalent 
to P, and W(t) is a Q-Brownian motion. 


E (Won) = L = WT) aT (10.26) 
PROOF: The proof uses Levy’s characterization of Brownian motion, as a 
continuous martingale with quadratic variation process t. Quadratic variation 
is the same under P and Q by Theorem 10.13. Therefore (with a slight abuse 
of notation) using the fact that ut is smooth and does not contribute to the 
quadratic variation, 


[W, W](t) = [B(t) + ut, Bt) + ut] = [B, B](t) = 
It remains to establish that W(t) is a Q-martingale. Let A(t) = Ep (A|F;). 
By the Corollary (10.11) to Theorem 10.10 it is enough to show that A(t)W (t) 


is a P-martingale. This is done by direct calculations. 


Ep(W()A()|Fs) = Ep ((B(t) + pte HPO-2""1F,) = W(s)A(s). 


It turns out that a drift of the form I H(s)ds with Hid H?(s)ds < œ can 
be removed similarly by a change of measure. 


Theorem 10.16 (Girsanov’s theorem for Ta of ae a — be 
a P-Brownian motion, and H(t) is such that X(t) = -h H is de- 
fined, moreover E(X) is a martingale. Define an eee measure ao by 


Me QB jag sy Tea ha Hd — g(x), (10.27) 


Then the process 
t 
W(t) = Bit) +f H(s)ds is a Q-Brownian motion. (10.28) 
0 


PROOF: The POP ls is ane to a previous one and we only sketch it. 
We show that W(t) ) + JH s)ds is a continuous Q-martingale with 
quadratic variation t. en bak Hee He by Levy’s characterization. The 
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quadratic variation of W(t) = y+ Jo s)ds is t, since the integral is 
continuous and is of finite cone W(t) is ee continuous. To establish 
that W(t) is a Q-martingale, by Corollary 10.11, it is enough to show that 
A(t)W (t) is a P-martingale, with A(t) = Ep (A|F;). To this end consider 
A(t)E(W)(t) = E(X),E(W), = E(X +W + [X, W] = E(X +B), 

where we have used Wi ies for a product of semimartingale exponentials 
(Theorem 8.13), [X, =- H s)ds and W(t) = B(t)+ [X, W] (t). Since 
X and B are both P- ane so is X + B. Consequently E(X + B) is also 
a P-martingale. Hence A(t)E(W); is a P-martingale, and consequently €(W); 
is a Q-martingale. This implies that W (t) is also a Q-martingale. 


Girsanov’s theorem holds in n dimensions. 


Theorem 10.17 Let B be a P n-dimensional Brownian motion and 
W = (Wi (t),..., W(t), where 


Wit) = B+ f H'(s)ds 


with H(t) = (H! (t), H?(t),...,H"(t)) a regular adapted process satisfying 
i |H(s)|?ds < oo. Let 


X(t)=—-H-B:= -5f H’ (s)dBŻ (8) 


and assume that E(X)r is a martingale. Then there is an equivalent probability 
measure Q, such that W is a Q-Brownian motion. Q is determined by 
dQ 
dP 
Comment that a sufficient condition for E(X) to be a martingale is The- 
T 772 
orem 8.17, E(e? Jo s (s)ds) < œ, and for E(H - B) to be a martingale is 
T 2 
B(e? Jo Hoas) < oo, 
The proof for n dimensions is similar to one dimension, using calculations 
from Exercise 10.5. 


We give a version of Girsanov’s theorem for martingales. The proof is 
similar to the one above, and is not given. 


— (Boor}) = A(Bo,r)) = E(X) (T). 


Theorem 10.18 Let Mı(t), 0< t < T be a continuous P-martingale. Let 
X(t) be a continuous P-martingale such that E(X) is a martingale. Define a 
new probability measure Q by 


R = A = E(X)(T) = eX) XID), (10.29) 
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Then 
M(t) = M(t) — [M, X(t) (10.30) 


is a continuous martingale under Q. 
A sufficient condition for E(X) to be a martingale is Novikov’s condition 


E(e2/* Ir) < oo, or Kazamaki’s condition (8.40). 


Change of Drift in Diffusions 


Let X(t) be a diffusion, so that with a P-Brownian motion B(t), X(t) satisfies 
the following stochastic differential equation with o(x,t) > 0, 


dX(t) = m (X(t), t)dt + o( X(t), t)dB(t). (10.31) 


Let 
(10.32) 


and define Q by dQ = AdP with 
Ae R = (- f HABO) =e te POO SOR. 41933) 
0 


By Girsanov’s theorem, provided the process €(H - B) is a martingale, the 
process W(t) = B(t) + i, H(s)ds is a Q-Brownian motion. But 


_ 2 a ( X(t), t) — u(X (t), t) 
dW (t) = dB(t) + H(t)dt = dB(t) + =o (10.34) 
Rearranging, we obtain the equation for X(t) 
dX(t) = (X (t), t)dt + o( X(t), dW (t), (10.35) 


with a Q-Brownian motion W(t). Thus the change of measure for a Brow- 
nian motion given above results in the change of the drift in the stochastic 
differential equation. 


Example 10.4: (Maximum of arithmetic Brownian motion) 
Let W(t) = wt+ B(t), where B(t) is P-Brownian motion, and W*(t) = maxs<s W (s). 
When u = 0 the distribution of the maximum as well as the joint distribution are 
known, Theorem 3.21. We find these distributions when u Æ 0. 

Let B(t) be a Q-Brownian motion, then W(t) is a P-Brownian motion with 
A = dP /dQ is given by (10.26). Let A = {W (T) € h,W*(T) € I2}, where J, Iz are 
intervals on the line. Then we obtain from (10.26) 


P((W(T),W*(T)) € A) = | etW(T)— 3 TAQ = Eo (etMO)-3”T1(A)). (10.36) 
A 
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Denote by qw,w«(x,y) and pw,w«(x,y) the joint density of (W(T),W*(T)) under 
Q and P respectively. Then it follows from (10.36) 


I pw,w* (x, y)dady = i qw,w* (x, yet® 2"? dzdy. (10.37) 
{xely ,yeIo} {xel),yeI2} 


Thus en 

pw,w> (x,y) =e! 2" Taw w» (x,y). (10.38) 
The density qw,w* is given by (3.16). The joint density pw,w~ (x,y) can be computed 
(see Exercise 10.6), and the distribution of the maximum is found by f pw,w» (x, y)dz. 


10.4 Change of Wiener Measure 


Girsanov’s Theorem 10.16 ae that if the Wiener measure P is changed 
to Q by dQ/dP = A = &( (QH (s)dB(s)) where B(t) is a P-Brownian mo- 
oe a a i ) is some predictable process, then Q is equivalent to P, and 
Bit t)+ JH s)ds for a Q-Brownian motion W. In this section we prove 
n converse that n Radon-Nikodym derivative of any a Q, equiva- 
lent to the Wiener measure P, is a stochastic exponential of fe a s)dB(s) for 
some predictable process q. Using the predictable representation es of 
Brownian martingales we prove the following result first. 


Theorem 10.19 Let F be Brownian motion filtration and Y be a positive 
random variable. If EY < œ then there exists a predictable process q(t) such 


that Y = (EY)e Jo DABO- fy P (oat 


PROOF: Let M(t) =E(Y|F;). Then M(t),0<t<Tisa aes uniformly 
integrable martingale. By Theorem 8.35 M(t) = EY + H (s)dB(s). Define 
q(t) = H(t)/M(t). Then we have 


dM(t) = H(t)dB(t) = M(t)q(t)dB(t) = M(t)d X(t), (10.39) 


with X(t = fi a . Thus M is a semimartingale exponential of X, 
M(t) = van a ie ie ) and the result follows. 


It remains to show that q is properly defined. We know that for each t, 
M(t) > 0 with probability one. But there may be an exceptional set N+, of 
probability zero, on which M(t) = 0. As there are uncountably many t’s the 
union of N;’s may have a positive probability (even probability one), precluding 
q being finite with a positive probability. The next result shows that this is 
impossible. 


Theorem 10.20 Let M(t),0<t<T, be a martingale, such that for any t, 
P(M(t) > 0) =1. Then M(t) never hits zero on [0,T]. 
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PROOF: Let 7 = inf{t : M(t) = 0}. Using the Basic Stopping equation 
EM(r AT) =EM(T). So we have 


EM(T) = EM(rAT) = EM(T)I(t > T)+EM(T)I(r < T) = EM(T)I(r >T). 


It follows that EM(T)(1 — I(r > T)) = 0. Since the random variables under 
the expectation are non-negative, this implies M(T)(1 — I(r > T)) = 0 as. 
Thus I(r > T) = 1 as. Thus there is a null set N such that r > T outside 
N, or P(M(t) > 0, for allt < T) = 1. Since T was arbitrary, the argument 
implies that 7 = oo and zero is never hit. 


Remark 10.2: Note that if for some stopping time 7 a non-negative martin- 
gale is zero, M(r) = 0, and optional stopping holds, for all t > r 
E(M(t)|F,) = M(r) = 0, then M(t) = 0 a.s. for all t >r. 


Corollary 10.21 Let P be the Wiener measure, B(t) be a P-Brownian motion 
and Q be equivalent to P. Then there exists a predictable process q(t), such 
that 


d =i T 
A(Bp,7]) = B (Bex Rah cee, (10.40) 
Moreover, B(t) y+ Jo a s)ds, where W(t) is a Q-Brownian motion. 


Proor: By the Radon-Nikodym theorem, (dQ/dP) = A with 0 < A < œ. 
Since P(A > 0) = 1 and EpA = 1, existence of the process q(t) follows by 
Theorem 10.19. Representation for B(t) as a Brownian motion with drift 
under Q follows from (10.27) in Girsanov’s theorem. 


(Corollary 10.21) is used in Finance, where q(t) denotes the market price for 
risk. 


10.5 Change of Measure for Point Processes 


Consider a point process N(t) with intensity A(t) (see Chapter 9 for defini- 
tions). This presumes a probability space, which can be taken as the space 
of right-continuous non-decreasing functions with unit jumps, and a proba- 
bility measure P so that N has intensity A(t) under P. Girsanov’s theorem 
asserts that there is a probability measure Q, equivalent to P, under which 
N(t) is Poisson process with the unit rate. Thus an equivalent change of mea- 
sure ie to a change in the intensity. If we look upon the compensator 
N s)ds as the drift, then an equivalent change of measure results in a change 
of the Hi 
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Theorem 10.22 (Girsanov’s theorem for Poisson processes) Let N(t) 
be a Poisson process with rate 1 under P, 0 <t <T. For a constant A > 0 
define Q by 

dQ a-nT+N(T)InA 

= = TE 10.41 
7p 7° (10.41) 


Then under Q, N is a Poisson process with rate A. 


PROOF: Let 
A(t) 5 ell-A)ETN (t) Ind _ emat ANO, (10.42) 


It is easy to see that A(t) is a P-martingale with respect to the natural filtration 
of the process N (t). Indeed, using independence of increments of the Poisson 
process, we have 


Ep(A@IF) = et EPANO/Z,) 
= emà HANDE P(r t)—N(s Fs ) 
(l- ANE NN (8) BR EQN —N(s $)) 
(l- At AN (s) o (A-1)(t-s) _ = e(l-A)s \N(s) = A(s). 


=e 

= e 
We have used that P-distribution of N(t)— N(s) is Poisson with parameter 
(t — s), hence Ep(AN®-NG)) = eQA-VE—s), We show that under Q the incre- 
ments N(t) — N(s) are independent of the past and have the Poisson A(t — s) 


distribution, establishing the result. Fix u > 0. The conditional expectation 
under the new measure, Theorem 10.10, and the definition of A(t) (10.42) yield 


7 s)) A(t) 
Bg (eMNO-MODIF,) = Bp (e u(N()-N( nO.) 
= UAB p (elute NM-M F) 
eC -A-9 E p (elos NMH=N(s))) 


eò s) (e"—1) 


Theorem 10.23 Let N(t), O0<t<T, be a Poisson process with rate A under 
Q. Define P by 


dP 
eae ee (A-1)T-N(T) Ind 10.43 
ane (10.43) 


Then under P, N(t) is a Poisson process with rate 1. 


As a corollary we obtain that the measures of Poisson processes with con- 
stant rates are equivalent with Likelihood ratio given by the following theorem. 
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Theorem 10.24 Let N be a Poisson process with rate > 0 under the proba- 
bility P,. Then it is a Poisson process with rate u under the equivalent measure 
P,, defined by 

dP y 


— e(l-A)T-N(T) (In A=In y) | 10.44 
ma (10.44) 


Note that the key point in changing measure was the martingale property 
(under the original measure) of the Likelihood A(t). This property was es- 
tablished directly in the proofs. Observe that the Likelihood is the stochastic 
exponential of the point process martingale M(t) = N(t)— A(t). For example, 
in changing the rate from 1 to A, A = €((A—1)M) with M(t) = N(t) — t. It 
turns out that a general point process N(t) with intensity A(t) can be obtained 
from a Poisson process with rate 1 by a change of measure. In fact, Theorem 
10.24 holds for general point processes with stochastic intensities. The form of 
stochastic exponential €((A—1)-M) for non-constant A is not hard to obtain, 
see Exercise 10.8, 


A=€( f (A(s) — )dM(s))(T) = elo ADE ENN (49.45) 


The next result establishes that a point process is a Poisson process (with 
a constant rate) under a suitable change of measure. 


Theorem 10.25 Let N(t) be a Poisson process with rate 1 under P, and 
M(t) = N(t) —t. If for a predictable process A(s), E( f (A(s) — 1)dM(s)) is 
a martingale for O < t < T, then under the probability measure Q defined by 
(10.45) dQ/dP = A, N is a point process with the stochastic intensity X(t). 
Conversely, if Q is absolutely continuous with respect to P then there exists 
a predictable process X(t), such that under Q, N is a point process with the 
stochastic intensity X(t). 


10.6 Likelihood Functions 


When observations X are made from competing models described by the prob- 
abilities P and Q, the Likelihood is the Radon-Nikodym derivative A = dQ/dP. 


Likelihood for Discrete Observations 


Suppose that we observe a discrete random variable X, and there are two 
competing models for X: it can come from distribution P or Q. For the 
observed number x the Likelihood is given by 

Q(X = 2) 


A) = Samay (10.46) 
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Small values of A(x) provide evidence for model P, and large values for model 
Q. If X is a continuous random variable with densities fo(x) under P and 
fi(x) under Q, then dP = fo(x)dx, dQ = fi (x)dzx, and the Likelihood is given 
by 


fi(z) 
A(x) = : 10.47 
(x) MO ( ) 
If the observed data are a finite number of observations £1, £2, ..., £n then 
similarly with w = (£1, £2,..., 2n) 
Q(X = x) fi(x) 
A(x) = =, or A(x) = ; 10.48 
= Paa A = Te) ee 


depending on whether the models are discrete or continuous. Note that if one 
model is continuous and the other is discrete, then the corresponding measures 
are singular and the Likelihood does not exist. 


Likelihood Ratios for Diffusions 
Let X be a diffusion solving the SDE with a P-Brownian motion B(t) 


dX(t) = pi (X(t), t)dt + o( X(t), t)dB(t), (10.49) 
Suppose that it satisfies another equation with a Q-Brownian motion W (t) 

dX (t) = po( X(t), t)dt + o( X(t), t)dW (t). (10.50) 
The Likelihood, as we have seen, is given by (10.33) 


T pa (X0) -u1 (X0). T (ua (XH) (XH) t)? 
a 5 eO aB- f AO ee 


A(X 0,7) E 4P 0 o (X (t),t) 


(10.51) 
Since B(t) is not observed directly, the Likelihood should be expressed as 
a function of the observed path Xjo,7;. Using equation (10.49) we obtain 


dB(t) = ee and putting this into the Likelihood, we obtain 


2 2 
d T po(X@),t)-u1 (X(t), t) a (T AHO- AH t) 
= dQ = eho o2(X(t),t) aX (t) 2 J, o2(X(t),t) dt 
dP 


A(X)r . (10.52) 
Using the Likelihood a decision can be made as to what model is more appro- 
priate for X. 


Example 10.5: (Hypotheses testing) 
Suppose that we observe a continuous function z+, 0 < t < T, and we want to 
know whether it is just White noise, the null hypothesis corresponding to probability 
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measure P or it is some signal contaminated by noise, the alternative hypothesis 
corresponding to probability measure Q. 


Ho: noise: dX(t) = dB(t), and, Hi: signal+noise: dX(t) = h(t)dt + dB(t) 


Here we have p(x) = 0, u2(£) = h(t), a(x) = 1. The Likelihood is given by 
T ya E 
nee Sl Oe O (10.53) 


This leads to the Likelihood ratio test of the form: conclude the presence of noise if 
A > k, where k is determined from setting the probability of the type one error to 
a. 


T: a T 2 
pel MOORE | POS as (10.54) 


Example 10.6: (Estimation in Ornstein-Uhlenbeck Model) 
Consider estimation of the friction parameter a in the Ornstein-Uhlenbeck model on 
0<t<T, 

dX(t) = —aX(t)dt + odB(t). 
Denote by Pa the measure corresponding to X(t), so that Po corresponds to o B(t), 
0<t<T. The Likelihood is given by 


dPa " aX(t 1 [7 X?(t 
0 0 


Maximizing log Likelihood, we find 


— So XOX) 
fp X2(t)dt 


a= 


(10.55) 


Remark 10.3: Let X(t) and Y(t) satisfy the stochastic differential equations 
for0<t<T 


dX(t) = wx (X(t), t)dt + ox(X(t), dW (t), (10.56) 


and 


dY (t) = py (Y(t), t)dt + oy (Y(t), t)dW(2). (10.57) 


Consider probability measures induced by these diffusions on the space of 
continuous functions on [0, T], C[0,7]. It turns out that if ox # oy then 
Px and Py are singular (“live” on different sets). It means that by observing 
a process continuously over an interval of time we can decide precisely from 
which equation it comes. This identification can be made with the help of the 
quadratic variation process, 


d[X, X](t) 


d[X, X](t) = 0? (X(t), t)dt, and o° (X(t), t) = i 


(10.58) 
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and [X, X] is known exactly if a path is observed continuously. 
If ox = oy =o then Px and Py are equivalent. The Radon-Nikodym 
derivatives (the Likelihoods) are given by 


= (Xjo,r)) = ahs ey Or, et 
(10.59) 
and 
dy —~ (Yor) = So OER ar o fo AO AA OD a 
(10.60) 


Notes. Material for this chapter is based on Karatzas and Shreve (1988), 
Liptser and Shirayev (1974), Lamberton and Lapeyre (1996). 


10.7 Exercises 


Exercise 10.1: Let P be N(y1,1) and Q be N(u2,1) on R. Show that they 
are equivalent and that the Radon-Nikodym derivative dQ/dP = A is given 
by A(x) = e(#2-/1)@+3(41-#2), Give also dP/dQ. 


Exercise 10.2: Show that if X has N (u, 1) distribution under P, then there is 
an equivalent measure Q, such that X has N (0,1) distribution under Q. Give 
the Likelihood dQ/dP and also dP /dQ. Give the Q-distribution of Y = X — u. 


Exercise 10.3: Y has a Lognormal LN (u, o°) distribution. Using a change 
of measure calculate EYI(Y > K). Hint: change measure from N (u, 0°) to 
N(u+ 02,02). 


Exercise 10.4: Let X(t) = B(t) + sint for a P-Brownian motion B(t). Let 
Q be an equivalent measure to P such X(t) is a Q-Brownian motion. Give 


A = dQ/dP. 

Exercise 10.5: Let B be an n- dan Brownian motion and H an adapted 
regular process. Let H - B(T) = $; ahs H’ (s)dB’ (s n a martingale. Show 
that the martingale exponential is given by exp(H - B(T i |H(s |” ds). 
Hint: show that quadratic variation of H - B is given by fs |H(s)|?ds, where 
|H(s)|? denotes the length of vector H(s). 


Exercise 10.6: Let W,,(t) = ut + B(t), where B(t) is a P-Brownian motion. 
Show that 


Pimax W(t) < ylW,(P) = 2) =1— 
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and 
—2y(y—@) 
T 


P(min W,(t) > y|W, (LT) =£)=1-e TT, wey. 


Hint: use the joint distributions (10.38) and (3.16). 
Exercise 10.7: Prove Theorem 10.23. 


Exercise 10.8: Let N(t) be Poisson process with rate 1 and N(t) = N(t)—t. 
Show that for an adapted, continuous and bounded process H(t), the process 
M(t) = aie H(s)dN(s) is a martingale for 0 < t < T. Show that 


£(M)(t) =e ie H(s)ds+ f’ In(1+H(s))dN(s) 


Exercise 10.9: (Estimation of parameters) 

Find the Likelihood corresponding to different values of u of the process X (t) 
given by dX(t) = uX (t)dt + oX (t)dB(t) on [0, T]. Give the maximum Likeli- 
hood estimator. 


Exercise 10.10: Verify the martingale property of the Likelihood occurring 
in the change of rate in a Poisson process. 


Exercise 10.11: Let dQ = AdP on Fr, A(t) = Ep(A|F;) is continuous. For 
a P-martingale M(t) find a finite variation process A(t) such that M’(t) = 
M(t) — A(t) is a Q-local martingale. 


Exercise 10.12: Let B and N be respectively a Brownian motion and a 
Poisson process on the same space (Q,F, F,P), 0 <t< T. Define A(t) = 
en2)N()—# and dQ = A(T)dP. Show that B is a Brownian motion under Q. 


Exercise 10.13: 


1. Let B and N be respectively a Brownian motion and a Poisson process 
on the same space (Q, F, F, P), 0 < t < T, and X(t) = B(t) + N(t). 
Give an equivalent probability measure Q, such that B(t)+t and N(t)—t 
are Q,-martingales. Deduce that X(t) is a Q,-martingale. 


2. Give an equivalent probability measure Q, such that B(t)+2t and N(t)— 
2t are Q5-martingales. Deduce that X(t) is a Qj-martingale. 


3. Deduce that there are infinitely many equivalent probability measures Q 
such that X(t) = B(t) + N(t) is a Q-martingale. 


Chapter 11 


Applications in Finance: 
Stock and FX Options 


In this chapter the fundamentals of Mathematics of Option Pricing are given. 
The concept of arbitrage is introduced, and a martingale characterization of 
models that don’t admit arbitrage is given, the First Fundamental Theorem 
of asset pricing. The theory of pricing by no-arbitrage is presented first in 
the Finite Market model, and then in a general Semimartingale Model, where 
the martingale representation property is used. Change of measure and its 
application as the change of numeraire are given as a corollary to Girsanov’s 
theorem and general Bayes formula for expectations. They represent the main 
techniques used for pricing foreign exchange options, exotic options (asian, 
lookback, barrier options) and interest rates options. 


11.1 Financial Derivatives and Arbitrage 


A financial derivative or a contingent claim on an asset is a contract that allows 
purchase or sale of this asset in the future on terms that are specified in the 
contract. An option on stock is a basic example. 


Definition 11.1 A call option on stock is a contract that gives its holder the 
right to buy this stock in the future at the price K written in the contract, 
called the exercise price or the strike. 


A European call option allows the holder to exercise the contract (that is, 
to buy this stock at K) at a particular date T, called the maturity or the 
expiration date. An American option allows the holder to exercise the contract 
at any time before or at T. 
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A contract that gives its holder the right to sell stock is called a put option. 
There are two parties to a contract, the seller (writer) and the buyer (holder). 
The holder of the options has the right but no obligations in exercising the 
contract. The writer of the options has the obligation to abide by the contract, 
for example, they must sell the stock at price K if the call option is exercised. 
Denote the price of the asset (e.g. stock) at time t by S(t). A contingent claim 
on this asset has its value at maturity specified in the contract, and it is some 
function of the values S(t),0 < t < T. Simple contracts depend only on the 
value at maturity S(T). 


Example 11.1: (Value at maturity of European call and put options) 

If you hold a call option, then you can buy one share of stock at T for K. If at time 
T, S(T) < K, you will not exercise your option, as you can buy stock cheaper than 
K, thus this option is worthless. If at time T, S(T) > K, then you can buy one 
share of stock for K and sell it immediately for S(T) making profit of S(T) — K. 
Thus a European call option has the value at time T 


C(T) = (S(T) — K)* = max(0, S(T) — K). (11.1) 


Similar considerations reveal that the value of the European put at maturity is 


P(T) = (K — S(T))*+ = max(0, K — S(T)). (11.2) 


Example 11.2: (Exotic Options) 

The value at maturity of Exotic Options depends prices of the asset on the whole 
time interval before maturity, S(t), t < T. 

1. Lookback Options. Lookback call pays at T, X = (S(T) Simin)? = S(T) — Smin 
and lookback put X = Smax — S(T), where Smin and Smax denote the smallest and 
the largest values of S(t) on [0, T]. 

2. Barrier Options. The call that gets knocked out when the price falls below a 
certain level H (down and out call) pays at T 


X =(S(T)— K)*I( min S(t) > H), S(0)>H, K >H. 
O<t<T 
3. Asian Options. Payoff at T depends on the average price S = +f, S(u)du 


during the life of the option. Average call pays X = (S — K)t, and average put 
X = (K —5S)*. Random strike option pays X = (S(T) — $)*. 


Whereas the value of a financial claim at maturity can be obtained from 
the terms specified in the contract, it is not so clear how to obtain its value 
prior to maturity. This is done by the pricing theory, which is also used to 
manage financial risk. 

If the payoff of an option depends only on the price of stock at expiration, 
then it is possible to graph the value of an option at expiration against the 
underlying price of stock. For example, the payoff function of a call option is 
(x — K)*. Some payoff functions are given in Exercise 11.1. 
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Arbitrage and Fair Price 


Arbitrage is defined in Finance as a strategy that allows to make a profit out 
of nothing without taking any risk. Mathematical formulation of arbitrage is 
given later in the context of a market model. 


Example 11.3: (Futures, or Forward) Consider a contract that gives the holder 1 
share of stock at time T, so its value at time T is the market price S(T). Denote 
the price of this contract at time 0 by C(0). The following argument shows that 
C(0) = S(0), as any other value results in an arbitrage profit. 

If C(O) > S(O), then sell the contract and receive C(0); buy the stock and pay 
S(0). The difference C(0) — S(0) > 0 can be invested in a risk-free bank account 
with interest rate r. At time T you have the stock to deliver , and make arbitrage 
profit of (C(O) — $(0))e”*. If C(O) < S(O), the reverse strategy results in arbitrage 
profit: you buy the contract and pay C(0); sell the stock and receive S(0), (selling 
the stock when you don’t hold it is called short selling and it is allowed). Invest 
the difference S(0) — C(0) > 0 in a risk-free bank account. At time T exercise the 
contract by buying back stock at S(T). The profit is (S(0) — C(0))e””. Thus any 
price C'(0) different from (0) results in arbitrage profit. The only case which does 
not result in arbitrage profit is C(0) = S(0). 


The fair price in a game of chance with a profit X (negative profit is loss) is 
the expected profit EX. The above example also demonstrates that financial 
derivatives are not priced by their expectations, and in general, their arbitrage- 
free value is different to their fair price. 

The following example uses a two-point distribution for the stock on expi- 
ration to illustrate that prices for options can not be taken as expected payoffs, 
and all the prices but one, lead to arbitrage opportunities. 


Example 11.4: The current price of the stock is $10. We want to price a call option 
on this stock with the exercise price K = 10 and expiration in one period. Suppose 
that the stock price at the end of the period can have only two values $8 and $12 
per share, and that the riskless interest rate is 10%. Suppose the call is priced at $1 
per share. Consider the strategy: buy call option on 200 shares and sell 100 shares 
of stock 


Sr=12 Sr=8 


Buy option on 200 shares -200 400 0 
Sell (short) 100 shares 1000 -1200 -800 
Savings account 800 880 880 
Profit 0 +80 +80 


We can see that in either case Sr = 8 or Sr = 12 an arbitrage profit of $80 is 
realized. It can be seen, by reversing the above strategy, that any price above $1.36 
will result in arbitrage. The price that does not lead to arbitrage is $1.36. 
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Equivalence Portfolio. Pricing by No Arbitrage 


The main idea in pricing by no-arbitrage arguments is to replicate the payoff 
of the option at maturity by a portfolio consisting of stock and bond (cash). 
To avoid arbitrage, the price of the option at any other time must equal the 
value of the replicating portfolio, which is valued by the market. Consider the 
one period case T = 1 first, and assume that in one period stock price moves 
up by factor u or down by d, d < 1 < u, and we want to price a claim C that 
has the values C„ and Cg on maturity. Schematically the following trees for 
prices of the stock and the option are drawn. 


us Cu if the price goes up 
bes dS = Ca if the price goes down 


Note that the values of the option Cu and C4 are known, since the values of 
the stock uS and dS are known by assumption. A portfolio that replicates the 
options consists of a shares of stock and b of bond (cash in savings account). 
After one period the value of this portfolio is 


auS+br if Srp=uS 
adS + br if Sr = ds. 


aSr+br={ 


Since this portfolio is equivalent to the option, by matching the payoff, we 
have a system of two linear equations for a and b. 


auS + br = Cy, \ 


adS + br = Ca 
Cu = Ca uC m dC, 
= o b= ——————_. 11.3 
(u—d)S’ (u —d)r ue) 
The price of the option must equal that of the replicating portfolio 
C=aS +b, (11.4) 


with a and b from (11.3). To prove that the price is given by (11.4) consider 
cases when the price is above and below that value C. If an option is priced 
above C', then selling the option and buying the portfolio specified by a and 
b results in arbitrage. If the option is priced below C then buying the option 
and selling the portfolio results in arbitrage. It will be seen later that there 
are no arbitrage strategies when the option is priced at C. 
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Example 11.5: (continued) K = 10, Cr = (Sr — K)t, r = 1.1. 


12 w=1.2 D= Cu 
10 C 
8 d = 0.8 0= Ca 


a = 0.5, b = —3.64, from (11.3). Thus this option is replicated by the portfolio 
consisting of 0.5 shares and borrowing 3.64 dollars. Initial value of this portfolio is 
C = 0.5 - 10 — 3.64 = 1.36, which gives the no-arbitrage price for the call option. 


The formula for the price of the option C (11.4) can be written as 


1 
C=aS+b= —(pCu + (1 = pCa), (11.5) 
with 5 
AS 
p= 77 (11.6) 


It can be viewed as the discounted expected payoff of the claim, with probability 
p of up and (1 — p) down movements. The probability p, calculated from the 
given returns of the stock by (11.6), is called the arbitrage-free or risk-neutral 
probability, and has nothing to do with subjective probabilities of market going 
up or down. 


In the above example p = a = 0.75. So that C = 2 -0.75 = 1.36. 


Binomial Model 


The one-period formula can be applied recursively to price a claim when trad- 
ing is done one period after another. The tree for option’s prices in a two-period 
model is given by 


C 
Le uu 
Cu 


S eu 


C 
N A Cau 


Ca 
Gs 


The final prices of the option are known from the assumed model for stock 
prices. The prices prior to expiration are obtained recursively, by using the 
one-period formula (11.5), 


1 1 


T 
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Using the formula (11.5) again 


1 1 


In the n period model, T = n, if Cudu..du = Cu...ud...d, then continuing by 
induction 


` (7 
i=0 


6= ne je 5 (“Joa =p)" Cay dod (11.7) 


$ n—i 


is today’s price of a claim which is to be exercised n periods from now. C can 
be seen as the expected payoff, expressed in the now dollar value, of the final 
payoff Cn, when the probability of the market going up on each period is the 
arbitrage-free probability p. For a call option 


Cu...u dud = (utd S = K)t, 


and C can be written by using the complimentary Binomial cumulative prob- 
ability distribution function Bin(j;n, p) = P(S, > 7) 


C = SBin(j;n, p') — r” KBin(j; n, p), 


In(K/Sd") 


where j = [AR 


]+ 1 and p’ = #p. 
It is possible to obtain the option pricing formula of Black and Scholes 
from the Binomial formula by taking limits as the length of the trading period 


goes to zero and the number of trading periods n goes to infinity. 
Pricing by No Arbitrage 
Given a general model these are the question we ask. 


1. If we have a model for evolution of prices, how can we tell if there are 
arbitrage opportunities? Not finding any is a good start, but not a proof 
that there are none. 


2. If we know that there are no arbitrage opportunities in the market, how 
do we price a claim, such as an option? 


3. Can we price any option, or are there some that cannot be priced by 
arbitrage arguments? 


The answers are given by two main results, called the Fundamental Theorems 
of asset pricing. In what follows we outline the mathematical theory of pricing 
of claims in finite and general market models. 
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11.2 A Finite Market Model 


Consider a model with one stock with price S(t) at time t, and a riskless 
investment (bond, or cash in a savings account) with price G(t) at time t. If 
the riskless rate of investment is a constant r > 1 then @(t) = r*G(0). 

A market model is called finite if S(t),t = 0,...,7 take finitely many 
values. A portfolio (a(t), b(t)) is the number of shares of stock and bond units 
held during |t — 1,¢). The information available after observing prices up to 
time t is denoted by the o-field F. The portfolio is decided on the basis 
of information at time t — 1, in other words a(t), b(t) are F;_1 measurable, 
t=1,...,7, or in our terminology they are predictable processes. The change 
in market value of the portfolio at time t is the difference between its value 
after it has been established at time t — 1 and its value after prices at t are 
observed, namely 


a(t) S(t) + b(t) G(t) — a(t) S(t — 1) — b(t) Bt — 1) = a(t) AS (t) +b) Ab). 


A portfolio (trading strategy) is called self-financing if all the changes in the 
portfolio are due to gains realized on investment, that is, no funds are borrowed 
or withdrawn from the portfolio at any time, 


t 
Vit) =V(0) +Y (a AS() + ABA), t=1,2,...,7. (11.8) 
i=1 
The initial value of the portfolio (a(t), b(t)) is V(0) = a(1)S(0) + b(1)G(0) and 
subsequent values V(t), t =1,...,7 are given by 


V(t) = a(t) S(t) + b(t) b(t). (11.9) 


V(t) represents the value of the portfolio just before time t transactions after 
time t price was observed. Since the market value of the portfolio (a(t), b(t)) 
at time t after S(t) is announced is a(t)S(t) + b(t) G(t), and the value of the 
newly setup portfolio is a(t + 1)S(t) + b(t + 1)G(t), a self-financing strategy 
must satisfy 


a(t)S(t) + b(t) A(t) = a(t + 1) S(t) + 0(t + 1) A(t). (11.10) 


A strategy is called admissible if it is self-financing and the corresponding value 
process is non-negative. 

A contingent claim is a non-negative random variable X on (Q, Fr). It 
represents an agreement which pays X at time T, for example, for a call with 
strike K, X = (S(T) — K)*. 

Definition 11.2 A claim X is called attainable if there exists an admissible 
strategy replicating the claim, that is, V(t) satisfies (11.8), V(t) > 0 and 
V(T) =X. 
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Definition 11.3 An arbitrage opportunity is an admissible trading strategy 
such that V(0) =0, but EV(T) > 0. 


Note that since V (T) > 0, EV(T) > 0 is equivalent to P(V (T) > 0) > 0. The 
following result is central to the theory. 


Theorem 11.4 Suppose there is a probability measure Q, such that the dis- 
counted stock process Z(t) = S(t)/G(t) is a Q-martingale. Then for any ad- 
missible trading strategy the discounted value process V(t)/G(t) is also a Q- 
martingale. 


Such Q is called an equivalent martingale measure (EMM) or a risk-neutral 
probability measure. 


PROOF: Since the market is finite, the value process V(t) takes only finitely 
many values, therefore EV(t) exist. The martingale property is verified as 
follows. 


Ro (SS) = Eg (a(t +1)Z(t+ 1) +b(t + D)|F:) 


Eg (Z(t +1)|F:)+b(t+1) since a(t) and b(t) are predictable 
Z(t)+b(t+1) since Z(t) is a martingale 
= a(t)Z(t) + b(t) since (a(t), b(t)) is self-financing (11.10) 

t 


This result is “nearly” the condition for no-arbitrage, since it states that 
if we start with zero wealth, then positive wealth cannot be created if prob- 
abilities are assigned by Q. Indeed, by the above result V(0) = 0 implies 
Eg(V(T)) = 0, and Q(V (T) = 0) = 1. However, in the definition of an arbi- 
trage strategy the expectation is taken under the original probability measure 
P. To establish the result for P equivalence of probability measures is used. 
Recall the definition from Chapter 10. 


Definition 11.5 Two probability measures P and Q are called equivalent if 
they have same null sets, that is, for any set A with P(A) = 0, Q(A) =0 and 
vice versa. 


Equivalent probability measures in a market model reflect the fact that in- 
vestors agree on the space of all possible outcomes, but assign different prob- 
abilities to the outcomes. The following result gives a probabilistic condition 
to assure that the model does not allow for arbitrage opportunities. It states 
that no arbitrage is equivalent to existence of EMM. 
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Theorem 11.6 (First Fundamental Theorem) A market model does not 
have arbitrage opportunities if and only if there exists a probability measure Q, 
equivalent to P, such that the discounted stock process Z(t) = S(t)/B(t) is a 
Q-martingale. 


ProoF: Proof of sufficiency. Suppose there exists a probability measure Q, 
equivalent to P, such that the discounted stock process Z(t) = S(t)/G(€) is a 
Q-martingale. Then there are no arbitrage opportunities. By Theorem 11.4 
any admissible strategy with V(0) = 0 must have Q(V(T) > 0) = 0. Since Q 
is equivalent to P, P(V(T) > 0) = 0. But then Ep(V(T)) = 0. Thus there 
are no admissible strategies with V(0) = 0 and Ep(V(T)) > 0, in other words, 
there is no arbitrage. 

A proof of necessity requires additional concepts, see, for example, Harrison 
and Pliska (1981). 


Claims are priced by the replicating portfolios. 


Theorem 11.7 (Pricing by No-Arbitrage) Suppose that the market model 
does not admit arbitrage, and X is an attainable claim with maturity T. Then 
C(t), the arbitrage-free price of X at time t < T, is given by V(t), the value 
of a portfolio of any admissible strategy replicating X. Moreover 


C(t) = V(t) = Eo (Fox) ’ (11.11) 


where Q is an equivalent martingale probability measure. 


PROOF: Since X is attainable, it is replicated by an admissible strategy with 
the value of the replicating portfolio V(t), 0 < t < T, and X = V (T). Fix one 
such strategy. To avoid arbitrage, the price of X at any time t < T must be 
given by the value of this portfolio V(t), otherwise arbitrage profit is possible. 

Since the model does not admit arbitrage a martingale probability measure 
Q exists by Theorem 11.6. The discounted value process V(t)/G(t) is a Q- 
martingale by Theorem 11.4, hence by the martingale property 

Vj) V(T) 


= Eol gy P) (11.12) 


But V(T) = X, and we have 


VO pe ft 
my = Ee (aay et) ae 
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Note that SAX represents the value of the claim X in the dollars at time t, 
and the price of X at time t = 0 is given by 


B(T) 


Remark 11.1: The Ihs. of equation (11.13) V(¢)/G(t) = alt) Z(t) + b(t) is 
determined by the portfolio a(t), b(t), but its rhs. Eg (sty X|F:) is deter- 


C(0) = Eo (an*) (11.14) 


mined by the measure Q and has nothing to do with a chosen portfolio. This 
implies that for a given martingale measure Q, and any t all self-financing 
portfolios replicating X have the same value, moreover this common value is 
also the same for different martingale probability measures. Thus equation 
(11.11) provides an unambiguous price for the claim X at time t. 

However, if a claim X is not attainable its expectation may vary with the 
measure Q, see Example 11.7 below. 


Now we know how to price attainable claims. If all the claims in the market 
are attainable, then we can price any claim. 


Definition 11.8 Market models in which any claim is attainable are called 
complete. 


The following result characterizes complete models in terms of the martingale 
measure. The proof can be found in Harrison Kreps (1979) and Harrison and 
Pliska (1983). 


Theorem 11.9 (Completeness) The market model is complete if and only 
if the martingale probability measure Q is unique. 


Example 11.6: (A complete model) 

The one-step Binomial model (t = 0, 1) can be described by payoff vector of the stock 
(d,u) and of the bond (r,r). A claim X is a vector (x1, £2) representing the payoff 
of the claim when the market goes down, x1, and up x2. As the two vectors (r,r) 
and (d,u) span R?, any vector is a linear combination of those two, hence any claim 
can be replicated by a portfolio, and the model is complete. To find a martingale 
probability Q, we solve EgZ(1) = Z(0), or +(pu+(1—p)d) = 1. The unique solution 
for p = ra, It is a probability if and only if d < r < u. Thus the existence and 
uniqueness of the martingale measure is verified. In this model any claim can be 
priced by no-arbitrage considerations. 


Example 11.7: (An incomplete model) 

The model where stock’s payoff can take 3 possible values (d, 1, u) and the bond with 
payoff (1,1,1) is not complete. These vectors span a subspace of R?, therefore not 
all possible returns can be replicated. For example, the claim that pays $1 when the 
stock goes up and nothing in any other case has the payoff (0,0,1) and cannot be 
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replicated. We now verify that there are many martingale measures. To find them 
we must solve EgZ(1) = Z(0). 


tfpuut+pnt+pad] =1 | 
Pu t+ Pn + pa =1 


Any solution of this system makes Z(t), t = 0, 1 into a martingale. 


r—-1 1-d u-r u-d 
ay H= IAT A A, 3 < Us PN, <1 
ii ga tol u1” OS Pu, Pn, Pa 


Pu = 
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Arbitrage in Continuous Time Models 


In continuous time there are different versions of the no-arbitrage concept. 
The main premise is the same as in discrete time, it should be impossible to 
make “something” out of nothing without taking risk. The difference between 
different versions of no-arbitrage is in the kinds of allowable (admissible) self- 
financing strategies that define how “something” is made. 

Let a(t) and b(t) denote the number of shares and bond units respectively 
held at time t. The market value of the portfolio at time t is given by 


V(t) = a(t) S(t) + DELE). (11.15) 


The change in the value of the portfolio due to change in the price of assets 
during dt is a(t)dS(t) + b(t)dG(t). 


Definition 11.10 A portfolio (a(t), b(t)), O0 < t < T, is called self-financing 
if the change in value comes only from the change in prices of the assets, 


dV (t) = a(t)dS(t) + b(t)dp(t), (11.16) 


V(t) = V(0) +f a(u)dS'(u) +f b(u)dß(u). (11.17) 


It is assumed that S(t) and G(t) are semimartingales. The processes a(t) 
and b(t) must be predictable processes satisfying a certain condition for the 
stochastic integral to be defined, see (8.8) and (8.52). 

In a general situation, when both S(t) and G(t) are stochastic, the rhs. of 
(11.17) can be seen as a stochastic integral with respect to the vector process 
(S(t), G(t)). Such integrals extend the standard definition of scalar stochastic 
integrals and can be defined for a larger class of integrands, due to possible 
interaction between the components. We don’t go into details, as they are 
rather complicated, and refer to Shiryaev (1999), Jacod and Shiryaev (1987). 
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The following concepts describe no-arbitrage: No Arbitrage (NA), No Free 
Lunch (NFL), No Free Lunch with Bounded Risk (NFLBR), No Free Lunch 
with Vanishing Risk (NFLVR), No Feasible Free Lunch with Vanishing Risk 
(NFFLVR), see Shiryaev (1999), Kreps (1981), Delbaen and Schachermayer 
(1994), Kabanov (2001). We don’t aim to present the Arbitrage theory in 
continuous time, and consider a simpler formulation, following Harrison and 
Pliska (1981). 

We consider a model in continuous time 0 < t < T consisting of two assets, 
a semimartingale S(t) representing the stock price process, and the savings 
account (or bond) G(t), G(0) = 1. We assume f(t) is continuous and of finite 
variation. The following is a central result. 


Theorem 11.11 (a(t), b(t) is self-financing if and only if the discounted value 


process a is a stochastic integral with respect to the discounted price process 


Vile) = p u u 
movos f (u)dZ(u), (11.18) 


where Z(t) = S(t)/GB(E). 


ProoF: Using the assumption that the bond process is continuous and of 
finite variation, we have 


VA) 1 C A 1 
a( %2) a N 14 (aa) HgO 
1 1 
= sq +v- (z) 


Using the self-financing property 


V(t)\ _ 1 i j e hs 
d =) = ON (t)dS(t) + b(t)dG(t)) + (a(t)S(t—) + b(t) A(t) d (z) 
=a(t) — i s(t-)a 75) tele) (saa + aOd) = a(t)dZ(t). 
—_—_ a —_ 
dZ(t)=d(S(t)/B(t)) d(B(t)- gay) =0 


The other direction. Assume (11.18). From V(t) = a(t)S(t) + b(t)G(#), we 
find b(t) = V(t)/G(t) — a(t) Z(t). Using (11.18), 


b(t) = V(0) +f a(u)dZ(u) — a(t) Z(t). (11.19) 
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Using (11.18) V(t) = V (0)8 (t) fo al . Hence 


ey) 
© 
ats 
T 
= 
4 


Ta 
r a 


(t) al 
(t)d(Z(t)6 
(t) 


and self-financing property of V(t) is established. 


The basis for mathematical formulation of arbitrage is the existence of 
the EMM, the equivalent martingale probability measure. In a general model 
existence of such a measure is introduced as an assumption. 


EMM Assumption 


There exists a martingale probability measure Q which is equivalent to the 
original measure P, such that the discounted price process Z(t) = S(t) /G() 
is a Q-martingale. 


We give examples of models where the EMM assumption does not and does 
hold. 


Example 11.8: dS(t) = .045(t)dt. d8(t) = 0.03G(t)dt. S(t) = S(0)e°**, B(t) = 
98 S(t)e~0O8# = €9 lt Since e™°™ is a deterministic non-constant function, 


there is no probability measure Q that would make it into a martingale. 


Example 11.9: S(t) = S(0) + is B(s)ds, where B(s) is a P-Brownian motion. 
By the Corollary 10.21, an equivalent change of measure results in B(t) being 
eer into B(t) + q(t), for some process q(t). Thus under the EMM Q, 
S(t) = S(O) + J: B(s)ds + J q(s)ds. Since S(t) has finite variation, it can not 
be a aE o a continuous martingale is either a constant or has infinite 
variation (see Theorem 7.29). Hence there is no EMM in this model. 


Example 11.10: (Bachelier model) 
S(t) = S(0) + ut + oB(t), (11.20) 
1 


for positive constants u,o. B(t) = 
Girsanov’s theorem. 


The EMM exists (and is unique) by the 


Example 11.11: (Black-Scholes model) 
dS(t) = uS(t)dt + o S(t)dB(t), B(t) = e”. (11.21) 


Solving the SDE for S(t), Z(t) = S(t)e™™ = S(0)e H7737 +B, This process is a 
martingale if and only if p = r; when p = r, it is the exponential martingale of o B(t), 


300 APPLICATIONS IN FINANCE 


when u Æ r, it is a martingale times a non-constant deterministic function, easily 
seen not a martingale. Writing dS(t) = o S(t)(4dt+dB(t)), and using the change of 
drift in diffusions, there is (a unique) Q, such that 4dt+dB(t) = dt + dW (t) for a 
Q Brownian motion W(t). So oB(t) = rt + oW (t) — ut Thus the equation for Z(t) 
in terms of W(t) is 


Z(t) = IC) aan ae +B) = S(0)e7 27 tow), 


verifying that Q is the EMM. 


Admissible Strategies 


The discounted value of a replicating self-financing portfolio V(t)/G(t) is a 
stochastic integral with respect to the Q-martingale Z(t), Theorem 11.11. We 
would like it to be a martingale, because then all its values can be determined 
by its final value, which is matched to the claim X. The martingale property 
implies V(t)/8(t) = Eq(V(T)/G(T)|F:) = Eq(X/G(T)|F:). 

However, a stochastic integral with respect to a martingale is only a local 
martingale. Thus the discounted value process of a self-financing portfolio in 
(11.18) is a local martingale under Q. Since it is non-negative, it is a super- 
martingale (Theorem 7.23). Supermartingales have non-increasing expecta- 
tions. In particular, there are strategies (called suicidal) that can turn the ini- 
tial investment into nothing. Adding such a strategy to any other self-financing 
portfolio will change the initial value without changing the final value. Thus 
there are self-financing strategies with the same final value but different initial 
values. This phenomenon is precisely the difference in the situation between 
the finite market model and the general model. Note that, similar to the finite 
market model, a self-financing strategy cannot create something out of noth- 
ing, since the expectations of such strategies are non-increasing. To eliminate 
the undesirable strategies from consideration, only martingale strategies are 
admissible. We follow Harrison and Pliska (1981). 

Fix a reference EMM, equivalent martingale probability measure Q, so 
that the discounted stock price process Z(t) is a Q-martingale; expectations 
are taken with respect to this measure Q. 


Definition 11.12 A predictable and self-financing strategy (a(t), b(t)) is called 
admissible if 


t 
J a? (u)d|Z, Z](u) is finite and locally integrable 0<t<T. (11.22) 
0 


Moreover V(t)/G(t) is a non-negative Q-martingale. 
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Note that condition (11.22) is needed to define the stochastic integral 
ie a(u)dZ (u), see condition (8.8). If (11.22) holds then fo a u)dZ(u) and con- 
sequently V(t)/G(t) are local martingales. If moreover V(t)/G(¢) > 0 then it 
is a supermartingale. 


Pricing of Claims 


A claim is a non-negative random variable. It is attainable if it is integrable, 
EX < œ, and there exists an admissible trading strategy such that at maturity 
T, V(T) = X. To avoid arbitrage, the value of an attainable claim at time 
t < T must be the same as that of the replicating portfolio at t. 


Theorem 11.13 The price C(t) at time t of an attainable claim X, is given 
by the value of an admissible replicating portfolio V(t), moreover 


C(t) = EQ (Fax) (11.23) 


The proof follows by the martingale property of V(t)/G(t) in exactly the same 
way as in the finite model case. 


C(t) = V(t) = B(t)Eg (V(T)/B(L)|Fi) 


= B(t)Eg (Sar) = B(t)Eg (rl) . 


Since attainable claims can be priced, the natural question is “how can 
one tell whether a claim is attainable?”. The following result gives an answer 
using the predictable representation property of the discounted stock price. 


Theorem 11.14 Let X be an integrable claim and let M(t) = Eg (stn Ft), 


0<t<T. Then X is attainable if and only if M(t) can be chee in the 
form 


M(t) = M(0) + f H(u)dZ(u) 


for some predictable process H. Moreover V (t)/B(t) = M(t) is the same for 
any admissible portfolio that replicates X. 


PROOF: Suppose X is attainable, and that (a(t), b(t)) replicates it. By the 
previous result V(t)/G(t) = M(t). It follows By (11. a e the Ta en 
sentation holds with H(t) = = : Woe if M(t a 0)+ JH 

take a(t) = H(t), and b(t) )+ H — a ). a gives a 
self-financing strategy. 
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Completeness of a Market Model 


A market model is complete if any integrable claim is attainable, in other 
words, can be replicated by a self-financing portfolio. If a model is complete, 
then any claim can be priced by no-arbitrage considerations. 

We know by Theorem 11.14 that for a claim to be attainable the martingale 
M(t) = Eg( Bry |F*) should have a predictable representation with respect to 
the Q-martingale Z(t) = S(t)/G(t). Recall, that the martingale Z(t) has the 
predictable representation property if any other martingale can be represented 
as a stochastic integral with respect to it, see Definition 8.34. For results on 
predictable representations see Section 8.12. In particular, if the martingale 
Z(t) has the predictable representation property, then all claims in the model 
are attainable. It turns out that the opposite is also true, moreover there is a 
surprising characterization, the equivalent martingale measure Q is unique. 


Theorem 11.15 (Second Fundamental Theorem) The following are equiv- 
alent: 


1. The market model is complete. 
2. The martingale Z(t) has the predictable representation property. 
3. The EMM Q, that makes Z(t) = S(t)/G(t) into a martingale, is unique. 


11.4 Diffusion and the Black-Scholes Model 


In this section we apply general results to the diffusion models of stock prices. 
In a diffusion model the stock price is assumed to satisfy 


dS(t) = u(S(t))dt + o(S(t))dB(t), (11.24) 


where B(t) is P-Brownian motion. Bond price is assumed to be deterministic 
and continuous S(t) = exp( i r(u)du). According to Theorem 11.13 pricing 
of claims is done under the martingale probability measure Q that makes 
Z(t) = S(t)/G(t) into a martingale. 


S(t))—r(t)s š 
Theorem 11.16 Let H(t) = wee Suppose that E ({, H(t)dB(t)) 


is a martingale. Then the EMM exists and is unique. It is defined by 
d . T Ut is ad 
AS a Lp (/ HABO) mice TOE Bd ia 
0 


The SDE for S(t) under Q with a Brownian motion W, is 


dS(t) = r(t)S(t)dt + o(S(t))dW(t). (11.26) 
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PROOF: Itis easy to see by It6’s formula that Z(t) satisfies 


az) = a(S) Oe SO) any, 


o(S(t)) ( u(S@) -rO Se) 
- (LEO OO n+ aB), 
For Z(t) to be a martingale it must have zero drift coefficient. Define Q by 
(11.25). Using (10.34), E770 dt + dB(t) = dW (t) with a Q Brownian 
motion W(t). This gives the SDE 


dZ(t) = (22) = EO awg, 


Using integration by parts we obtain the SDE (11.26) under the EMM Q for 
S(t). 


To price claims, expectations should be calculated by using equation (11.26), 
and not the original one in (11.24). The effect of change of measure is the 
change in the drift: u(x) is changed into rg, where r is the riskless interest 
rate. 


Black-Scholes Model 


The Black-Scholes model is the commonly accepted model for pricing of claims 
in the Financial Industry. The main assumptions of the model are: the riskless 
interest rate is a constant r, o(S(t)) = S(t), where the constant ø is called 
“volatility”. The stock price processes S(t) satisfies SDE 


dS(t) = uS(t)dt + o8(t)dB(t). (11.27) 


Using It6’s formula with f(x) = ln x (see Example 5.5) we find that the solution 
is given by 
o2 
S(t) = S(0)e H7 THB, (11.28) 


This model corresponds to the simplest random model for the return R(t) on 
stock ae 
dR(t) = ase) = pdt + odB(t). (11.29) 
S(t) 
S(t) is the stochastic exponential of R(t), dS(t) = S(t)dR(t). By Theorem 
8.12 S(t) = S(O)E(R)t = S(0)ePO -7ER giving (11.28). 
The EMM Q makes S(t)e~™ into a martingale. By Theorem 11.16 it exists 
and is unique. It is obtained by letting 4dt + dB(t) = Sdt + dW (t), for a Q- 


Brownian motion W(t). In this case H(t) = ==, fo H(t)dB(t) = —*B(t) 
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and € ({, H(t)dB(t)) (T) = €(4*B) (T) is the exponential martingale of 
Brownian motion. The SDE (11.26) under the EMM Q for S(t) is 


dS(t) = rS(t)dt + oS(t)dW(t). (11.30) 


It has the solution 5 

S(t) = S(0)e CTTW, (11.31) 
Thus under the equivalent martingale measure Q, S(T) has a Lognormal dis- 
tribution, LN ((r — ZT + In $(0),0?T). The price of a claim X at time T is 
given by 

C(t) = e" T-9Eo(X |F). (11.32) 
If X = g(S(T)), then by the Markov property of S(t), C(t) = Eg(g($(L))|Ft) = 
Eg(9(S(T))|S(t)). The conditional distribution under Q given F; is obtained 
from the equation (using (11.31)) 


S(T) = 8(t)e?- PITH +e(W(T)-WE) 


and is Lognormal, LN ((r — ante — t) + In S(t), o?°(T — t)). 


Pricing a Call Option 

A call option pays X = (S(T)— K)* at time T. To find its price at time t = 0, 
according to (11.32), we must calculate E(S(T) — K)*, where expectation E is 
taken under the arbitrage-free probability Q. In the Black-Scholes model the 
Q-distribution of S(T) is a Lognormal. Denote for brevity the parameters of 
this distribution by u and o°, with u = (r— 2 )T-+in (0) and new o°? = o°T. 


E(S(T)-K)*=E(S(T)-K)I(X > K) =ES(T)I(S(T) > K)-KQ(S(T) > K). 


The second term is easy to calculate by using Normal probabilities as 
K(1— @((log K — )/o)). 

The first term can be calculated by a direct integration, or by changing 
measure, see Example 10.2. S(T) is absorbed into the Likelihood dQ,/dQ = 
A = S(T)/ES(T). Now, by Lognormality of S(T) write it as eY, with Y ~ 
N(,02). Then ES(T) = EeY = e#*+?"/2, and 


E(eYI(eY > K)) = Ee’ E((e’ /Ee’ )I(e’ > K)) = Ee’ E(AI (eY > K)) 
Ee’ Eg,I(e’ > K) = Ee” Q,(e* > K). 


II 


Since dQ, /dQ = e¥-#-*"/2, it follows by Theorem 10.4 that Q, is 
N (u + 07,07) distribution. Therefore 


E(e’ I(eY > K)) = e#*?/2(1 — ®((In K — u — 0”)/o)). 
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Finally, using 1 — (x) = ®(—2), we obtain 


pte Se pee aes 


E(X — K)t = e#+°°/26( 
Oo oO 


The price at time t of the European call option on stock with strike K and 
maturity T is given by (Theorem 11.13) 


C(t) = eT MEQ ((S(L) — K)*|Fi). 


It follows from (11.33) that C(t) is given by the Black-Scholes formula 
C(t) = S(t)B(A(t)) — Ke-"(F-98 (no) -ovyT— 1) (11.34) 


where 
In O 4 (r + 40?)(T - t) 


h(t) = (11.35) 


Pricing of Claims by a PDE. Replicating Portfolio 


Let X be a claim of the form X = g($(T)). Since the stock price satisfies SDE 
(11.30), by the Markov property of S(t) it follows from (11.32) that the price 
of X at time t 


C(t) = eT MEQ (9(S(L))|Fe) = eBo (STIS). (11.36) 
By the Feynman-Kac formula (Theorem 6.8), 
O(w,t) = eT? EQ (9(S(T))|S(®) = 2) 
solves the following partial differential equation (PDE) 


1 > 28°C(x,t) C(x, t) Clr a 


The boundary condition is given by the value of the claim at maturity 
C(x,T) = g(x), x> 0, 


with g(x) is the value of the claim at time T when the stock price is x. When 
x = 0 equation (11.37) gives 


C(0,t) =e"F 9 g(0), O<t<T, 


(portfolio of only a bond has its value at time t as the discounted final value.) 
For a call option g(x) = (x— K)*. This partial differential equation was solved 
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by Black and Scholes with the Fourier transform method. The solution is the 
Black-Scholes formula (11.34). 

Next we give an argument to find the replicating portfolio, which is also 
used to re-derive the PDE (11.37). It is clear that C(t) in (11.36) is a function 
of S(t) and t. Let (a(t), b(t)) be a replicating portfolio. Since it is self-financing 
its value process satisfies 


dV (t) = a(t)dS(t) + 0(t)dG(t). (11.38) 


If X is an attainable claim, that is, X = V(T), then the arbitrage-free value 
of X at time t < T is the value of the portfolio 


V(t) = a(t) S(t) + b(t) G(t) = CS), t). (11.39) 
Thus we have by (11.38) 
dC(S(t),t) = dV (t) = a(t)dS(t) + b(t)dG(t). (11.40) 


Assume that C(x, t) is smooth enough to apply It6’s formula. Then from the 
SDE for the stock price (11.30) (from which d[S, $](t) = 07S?(t)dt), we have 


_ CSE), 4) FÀ OC(S(t), t) a 10°C(S(t), t) 
Ot 2 Ox? 


dt 
Ox 


o° S?(t)dt. 
(11.41) 


dC(S(t), t) dS(t) 


By equating the two expressions above we have 


AC (S(t), t) _ (OC(S(t), t) 1PCSE t) 202 

(a(@)- J as@= (Ss ee S (1) dt—0(0)aB(e), 
(11.42) 

The lhs has a positive quadratic variation unless a(t) — OE) = 0, and the 

rhs has zero quadratic variation. For them to be equal we must have for all t 

a(t) = C raii (11.43) 

Ox 
and consequently 


b(t)dB(t) = oa + EO geaen) dt. (11.44) 


Putting the values of a(t) and b(t)6(t) into equation (11.39), taking into ac- 
count dG(t) = rf(t)dt, and replacing S(t) by xz, we obtain the PDE (11.37). 
The replicating portfolio is given by (11.43) and (11.44) 


Using the Black-Scholes formula, as the solution of this PDE, the replicat- 
ing portfolio is obtained from (11.43) 


a(t) = S(t)B(A(t)), b(t) = K@(A(t) —o VT), (11.45) 


where h(t) is given by (11.35). 
Another way to derive the Black-Scholes PDE, useful in other models, is 
given in Exercise 11.8. 
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Validity of the Assumptions 


The simulated time series of a geometric Brownian motion (with o = 1 and 
various us (265 points)) look similar to the time series of observed stock prices 
(daily stock closing prices for period 5 Aug 91- 7 Aug 92), see Fig. 11.1. This 
shows that the suggested model is capable of producing realistic looking price 
processes, but, of course, does not prove that this is the correct model. The 
assumption of normality can be checked by looking at histograms and using 
formal statistical tests. The histogram of the BHP stock returns points to 
normality. However, the assumption of constant volatility does not seem to be 
true, and this issue is addressed next. 


Implied Volatility 


Denote by C™ the observed market price of an option with strike K and 
expiration T. The implied volatility 1,(,T) is defined as the value of the 
volatility parameter ø in the Black-Scholes formula (11.34) that matches the 
observed price, namely 


CPS((K,T)) = C” (K,T), (11.46) 


where CPS is given by (11.34). It has been observed that the implied volatility 
as a function of strike K (and of term T) is not a constant, and has a graph 
that looks like a smile. Models with stochastic volatility are able to reproduce 
the observed behaviour of implied volatilities, see Fouque et al. (2000). 


Stochastic Volatility Models 


A class of models in which the volatility parameter is not a constant, but a 
stochastic process itself is known as stochastic volatility models. These models 
were introduced to explain the smile in the implied volatility. An example of 
such is the Heston (1993) model, in which the stock price S(t) and the volatility 
v(t) satisfy the following SDEs under the EMM, 


dS(t) rS(t)dt + y/v(t)S(t)dB(t) 
du(t) = a(u—v(t))dt + 8y v(t) dW (t), (11.47) 


where the Brownian motions B and W are correlated. Stochastic volatility 
models are incomplete, see Example 8.26. Therefore a replicating portfolio 
involves a stock and another option. Pricing by using an EMM or replicating 
by a portfolio with a stock and another option, leads to a PDE for the price of 
an option, by using the Feynman-Kac formula with a two-dimensional diffusion 
process, see Fouque et al. (2000), p.45. Heston (1993) derived a PDE and its 
solution for the transform of the option price. The price itself is obtained by 
the inversion of the transform. 
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Figure 11.1: Simulated and real stock prices. 


11.4. DIFFUSION AND THE BLACK-SCHOLES MODEL 309 


0.4 
0.4 


0.3 
0.3 


0.2 
0.2 


0.1 
0.1 


0.0 
0.0 


Figure 11.2: Lognormal LN(0,1) and Normal N(0,1) densities. 
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Figure 11.3: Histogram of BHP daily returns over 30 days. 
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11.5 Change of Numeraire 


A numeraire is the asset in which values of other assets are measured. The 
no-arbitrage arguments imply that absence of arbitrage opportunities can be 
stated in terms of existence of the equivalent martingale probability measure, 
EMM, (risk-neutral measure) that makes the price process measured in a cho- 
sen numeraire into a martingale. If the savings account 3(t) = e”® is a nu- 
meraire, then the price S(t) is expressed in units of the saving’s account, and 
Q makes Z(t) = S(t)/e™* into a martingale. But we can choose the stock price 
to be the numeraire, then the pricing probability measure, EMM Q,, is the 
one that makes 3(t)/S(t) into a martingale. Change of numeraire is used for 
currency options and interest rates options. 

The price of an attainable claim paying X at T, C = EQ( gen) can be 
calculated also as C = Eg, (ser): The way to change measures and express 
the prices of claims under different numeraires are given by the next results. 
Note that both assets S(t) and (t) can be stochastic, the only requirement is 
that S(t)/G(t), 0 <t <T is a positive martingale. 


Theorem 11.17 (Change of numeraire) Let S(t)/G(t),0O <t <T bea 
positive Q-martingale. Define Q, by 
dQ; S(T)/S(0) 
— —_———. 11.48 
iQ ETO Se 


Then under Q,, 3(t)/S(t) is a martingale. Moreover, the price of an attainable 
claim X at time t is related under the different numeraires by the formula 


= A(T) = 


S(t) 
S(T) 
8(t)/S(0) 


Proor: A(t) = BTA) IS a positive Q-martingale with Eg(A(T)) = 1. 
Therefore by Theorem 10.12 


— 5) /B(0) is a Qi-martingale. (11.50) 


A(t) — S(t)/S(0) 
By the general Bayes formula Theorem 10.10 (10.23) 


C(t) = Fo( 24 _x|F,) = Bo, ( 


AD X|F;). (11.49) 


BO yp) Eil) _ EaSI S(t) 
POG OE © ie Gls agi GUT ee 
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A General Option Pricing Formula 


As a corollary we obtain the price of a call option in a general setting. 


Eo (zt (S(T) — K)*) = Eo($RI(S(T) > K)) - Eo(gyI(S(L) > K)). 


The first term is evaluated by changing the numeraire, 
Eo (SH 1(S(L) > K)) = Eo, (SRI(S(T) > K)) = RQ (S(T) > K). 


B(T) Ar B(T) 
The second term is KQ(S(T) > K)/8(T), when G(t) is deterministic. Thus 
0 K 


This is a generalization of the Black-Scholes formula (Geman et al. (1995), 
Björk (1998), Klebaner (2002)). One can verify that the Black-Scholes formula 
is also obtained when the stock is used as the numeraire (Exercise 11.12). 

If G(t) is stochastic, then by a using T-bond as a numeraire, 


K 
Eg (Gaus) > K) = P(0,T)KQr(S(T) > K), 


where P(0,T) = Eg(1/8(T)), A(T) = POTET: To evaluate expectations 
above, we need to find the distribution under the new measure. This can be 
done by using the SDE under Q,. 


SDEs under a Change of Numeraire 


Let S(t) and E(t) be positive processes. Let Q and Q, be respectively the 
equivalent martingale measures when ((t) and S(t) are numeraires, so that 
S(t)/G(t) is a Q-martingale, and ((t)/S(t) a Q,-martingale. Suppose that 
under Q 


II 


dS(t) ps(t)dt + os(t)dB(t), 
dp) = pa(t)dt + op(t)dB(¢), 


where B is a Q-Brownian motion, and coefficients are adapted processes. Let 
X(t) have the SDEs under Q and Q,, with a Q,-Brownian motion B1 


dX(t) = po(X(t))dt + o(X(t))dB(t), 
dX(t) = pu(X(t))dt + o(X(t))dB* (t), 


Since these measures are equivalent, the diffusion coefficient for X is the same, 
and the drift coefficients are related by 


Theorem 11.18 


) . (11.52) 
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PROOF: We know from Section 10.3, that Q, and Q are related by 


ar) = i =e ( f no t)dB(t D) (T), 


with H(t) = AWW), On the other hand, by Theorem 11.17 $34 = 
A(T) = ZSO. Moreover A(t) = Eo(A(T)|F:) = ZRS. by the Gis 
gale property of (S/3). Using the exponential SDE, 

S(t)/ oat 

B(t)/G(0) 


dA(t) = A(t)H(t)dB(t) = d ( 


it follows that 


~ St) BO) 
Using the SDEs for S and 6, and that 5/8 has no dt term, the result follows. 


HE) D apy — (SED), 


Corollary 11.19 The SDE for S(t) under Q,, when it is a numeraire, is 


ast) = (us a 


where B*(t) is a Q,-martingale. Moreover, in the Black-Scholes model when 
us = uS, and og =0S, the new drift coefficient is (u +07). 


) dt + og(t)dB'(t), (11.53) 


For other results on change of numeraire see Geman et al. (1995). 


11.6 Currency (FX) Options 


Currency options involve at least two markets, one domestic and foreign mar- 
kets. For simplicity we consider just one foreign market. For details on arbi- 
trage theory in Foreign Market Derivatives see Musiela and Rutkowski (1998). 

The foreign and domestic interest rates in riskless accounts are denoted by 
rr and ra, (subscripts f and d denoting foreign and domestic respectively). 
The EMM’s exist in both markets, Q; and Q4. Let U(t) denote the value 
of a foreign asset in foreign currency at time t, say JPY. Assume that U(t) 
evolves according the Black-Scholes model, and write its equation under the 
EMM (risk-neutral) measure, 


dU (t) = rpU (t)dt + oyU(t)dB y(t), (11.54) 
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where By is a Q,-Brownian motion. A similar equation holds for assets in 
the domestic market under its EMM Q,. The price process discounted by e”4t 
in the domestic market is a Q,-martingale, and an option on an asset in the 
domestic market which pays C(T) (in domestic currency) at T has its time t 
price 
C(t) = e YE Q, (C(L)|F). 

A link between the two markets is provided by the exchange rate. Take X (t) 
to be the price in the domestic currency of 1 unit of foreign currency at time 
t. Note that X(t) itself is not an asset, but it gives the price of a foreign asset 
in domestic currency when multiplied by the price of the foreign asset. We 
assume that X(t) follows the Black-Scholes SDE under the domestic EMM Q, 


dX(t) Z= uxX(t)dt + oxX(t)dBa(t), 
and show that ux must be rq — rs. Take the foreign savings account e”!', its 
value in domestic currency is X(t)e"f’. Thus e~"@* X(t)e"’ should be a Q,- 
martingale, as a discounted price process in the domestic market. But from 


(11.6) 
et X(t)e"st = S(O)elrs Pat ex 22x )ttox Balt) 


implying ux =ra—ry. Thus the equation for X(t) under the domestic EMM 
Q4 is 


dX(t) = (ra — rp) X(t)dt+ ox X(t)dBa(t), (11.55) 
with a Q,-Brownian motion By(t). In general, the Brownian motions 1 and 
By are correlated, i.e. for some |p| < 1, By(t) = pBa(t) + V1 — p? W (t), with 
independent Brownian motions Bg and W. d|Ba, W](t) = pdt. % er 
can also be set up by using two-dimensional independent Brownian motions, 
a (column) vector Q,-Brownian motion [Bu,W], and a matrix oy, the row 


vector ou = [p, \/1 — p?]). 


The value of the ee asset in domestic currency at time t is given by 
U(t)X(t) = U(t). Therefore e~'¢*U(t), as a discounted price process, is a 
Q,-martingale. One can easily see from (11.54) and (11.55) that 


etU (t) = X(0)U (O)e7 2 Cb tex )ttou Bs (tox Bale) 


For it to be a Q-martingale, it must hold dB;(t) + poxdt = dBa(t) is a Q,- 
Brownian motion (correlated with Ba). This is accomplished by Girsanov’s 
theorem, and the SDE for U(t) under the domestic EMM Q, becomes 


dU (t) = (rf — povox)U(t)dt + oyU(t)dBalt). (11.56) 


Now it is also easy to see that the process U(t) (the foreign asset in domestic 
currency) under the domestic EMM Q, has the SDE 


dU (t) = rqU (t)dt + (oy dBa(t) + oxdBa(t))U(t). (11.57) 
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Since 7 
ouvdBa(t) + oxdBa(t) = cdWa(t), 
for some Q,-Brownian motion Wa(t), with 


a? = of, +0% + 2povox, (11.58) 


the volatility of U(t) is given by the formula that combines the foreign asset 
volatility and the exchange rate volatility (11.58). 


Options on Foreign Currency 


Consider a call option on one unit of foreign currency, say on JPY with the 
strike price K4 in AUD. Its payoff at T in AUD is (X(T) — Ka)*. Thus its 
price at time t is given by 


C(t) = eT? YE, ((X(T) — Ka) t| Fo). 


The conditional Q,-distribution of X(T) given X(t) is Lognormal, by us- 
ing (11.6), and standard calculations give the Black-Scholes currency formula 
(Garman-Kohlhagen (1983)) 


C(t) = X (He T-@(A(t)) — Kae "T-I (ht) — ox VT — 8), 


roe In ZÈ + (ra — ry +0}/2)(T — t) 
7 oxvT -t i 


Other options on currency are priced similarly, by using equation (11.6). 


Options on Foreign Assets Struck in Foreign Currency 


Consider options on a foreign asset denominated in the domestic currency. 
A call option pays at time T the amount X(T)(U(T) — Kș)™. Since the 
amount is specified in the domestic currency the pricing should be done under 
the domestic EMM Q,. The following argument achieves the result. In the 
foreign market options are priced by the Black-Scholes formula. The price in 
domestic currency is obtained by multiplying by X(t). A call option is priced 
by 

C(t) = U(t) (80) ~ Kje" (T-)0 (ne) E 1)) 
where 
ange eet 407,)(T — t) 

oyvT -t i 


Calculations of the price under Q, are left as an exercise in change of nu- 
meraire, Exercise 11.13. 
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Options on Foreign Assets Struck in Domestic Currency 


Consider a call option on the foreign asset U(t) with strike price Ką in AUD. 
Its payoff at T in AUD is (X(T)U(T) — Ka)* = (U(T) — Ka)*. Thus its price 
at time t is given by 


C(t) = eT MEQ, ((U(L) — Ka)* Fi). 


The conditional distribution of U(T) given U(t) is Lognormal, by equation 
(11.57), with volatility given by a? = of, + 0% + 2povox from (11.58). Thus 


C(t) = U(t)®(A(t)) — Kae  B(h(t) - VT =t), 


in Ost (r+ SC -2) 
oVT -t 


Guaranteed Exchanged Rate (Quanto) Options 


A quanto option pays in domestic currency the foreign payoff converted at a 
predetermined rate. For example a quanto call pays at time T the amount 
X(0)(U(T) — Kf)”, (Ky is in foreign currency). Thus its time t value is given 
by 

C(t) = X(0)e™F MEQ, (UL) — Ki) TIF). 


The conditional distribution of U(T) given U(t) is Lognormal, by equation 
(11.56), with volatility oy. Standard calculations give 


CH) = X (0) T- (UTB) — Kp (h(t) — ovVT =), 


2 
In ZO 4(64 (7-1) 
where ô = Tf — poucx and h(t) = = 


11.7 Asian, Lookback and Barrier Options 


Asian Options 


We assume that S(t) satisfies the Black-Scholes SDE, which we write under 
the EMM Q 
dS(t) = rS(t)dt + oS(t)dB(t). 


+ 
Asian options pay at time T, the amount C(T) given by (+ fis u)du — K) 


+ 
(fixed strike K) or (4 ks t)dt — KS(T )) (floating strike KS(T)). Both 
kinds involve the integral average of the stock price S = 4 re S(u)du. The 
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average of Lognormals is not an analytically tractable distribution, therefore 
direct calculations do not lead to a closed form solution for the price of the 
option. Pricing by using PDEs was done by Rogers and Shi (1995), and Vecer 
(2002). We present Vecer’s approach, which rests on the idea that it is possible 
to replicate the integral average of the stock 9 by a self-financing portfolio. 
Let 


1 


a(t) = api — e™"(T-¢)) and 
dV (t) = a(t)dS(t) + r(V(t) — alt)S(t))dt 
= rV(t)dt + a(t)(dS(t) — rS(t)dt), (11.59) 


with V(0) = a(0)S(0) = (1 —e-'")S(0). It is easy to see that V(t) is a 
self-financing portfolio (Exercise 11.14). Solving the SDE (11.59) (by looking 
at d(V(t)e~")) for V(t) and using integration by parts, we obtain 


V(T) = eT V(0) +a(T)S(T) — e"Ta(0)S(0) — f ; e" T- S(t)da(t) 


1 fT 
= F. S(t)dt, (11.60) 
because of 


d(e T9 S(t)a(t)) = eT a(t)dS(t)—re”™ 7-9 S(t)dt+e" 7-9 S(t)da(t), and 


a(T)S(T)—e"? a(0).S(0)= | “PY a(t)(dS(¢)—r8(t)dt)+ i ET-d g(4)da(t). 
0 0 


The self-financing portfolio V (t) consists of a(t) shares and b(t) = a Ak S(u)du 
cash and has the value S at time T. 

Consider next pricing the option with the payoff (S — K,S(T) — K2)*, 
which encompasses both the fixed and the floating kinds (by taking one of K’s 
as 0). To replicate such an option hold at time t a(t) of the stock, start with 
initial wealth V(0) = a(0)S(0) —e~"? Kg and follow the self-financing strategy 


(11.59). The terminal value of this portfolio is V (T) = S — K2. The payoff of 
the option is 


(8 — K,S(T) — K2)" = (V(T) — Ki S(T))*. 
Thus the price of the option at time t is 

C(t) = eT MEQ(V(T) — KiS(L))*|Fi). 
Let Z(t) = V(t)/S(t). Then proceeding, 

C(t) =e F MEQ (S(T)(Z(L) — Ki) *|F1). 
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The multiplier S(T) inside is dealt with by the change of numeraire, and the 
remaining expectation is that of a diffusion Z(t), which satisfies the backward 
PDE. Using S(t) as a numeraire (Theorem 11.17, (11.49)) we have 


C(t) = S(t)B9, ((Zr - K)*|Fe). (11.61) 
By Ito’s formula the SDE for Z(t) under Q, when e™ is a numeraire, is 
dZ(t) = o° (Z(t) — a(t))dt + o(a(t) — Z(t))dB(t). 


By Theorem 11.18 the SDE for Z(t) under the EMM Q,, when S(t) is a 
numeraire, is 


dZ(t) = —o(Z(t) — q(t))dB*(t), 


where B! (t) is a Q,-Brownian motion, (B'(t) = B(t)+ ot). Now we can write 
a PDE for the conditional expectation in (11.61). 


Eq, ((Z(T) — K)*|Fr) = Ea, (ZL) - K1)*|Z(), 
by the Markov property. Hence 
u(x,t) = Eg, ((Z(T) — K1)*|Z() = x) 


satisfies the PDE (see Theorem 6.6) 


subject to the boundary condition u(x, T) = (x — K,)*. Finally, the price of 
the option at time zero is given by using (11.61) 


V (0, 5(0), Kı, K2) = S(0)u(0, Z(0)), 


with Z(0) = V(0)/S(0) = 4(1 — e™"T) — e-*? K2/S(0). 


Lookback Options 


A lookback call pays X = S(T) — S, and a lookback put X = S* — S(T), 
where S, and S* denote the smallest and the largest values of stock on [0, T]. 
The price of a lookback put is given by 


C =e TEg(S* — S(T)) = e-"TEg(S*) — eT Eg(S(T)), (11.62) 
where Q is the martingale probability measure. 


Since S(t)e~" is a Q-martingale, e~"7Eg(S(T)) = S(0). To find Eg(S*), 
the Q-distribution of S* is needed. The equation for S(t) is given by S(t) = 
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S(O)e("- 27422) with a Q-Brownian motion B(t), see (11.31). Clearly, S* 
satisfies . 

S = s{oyel e000)" 
The distribution of maximum of a Brownian motion with drift is found by 
a change of measure, see Exercise 10.6 and Example 10.4 (10.38). By the 
Girsanov’s theorem (r — $07)t + oB(t) = oW (t), for a Q,-Brownian motion 
W(t), with ic (Wor) = eW(T)-3°T e= (r — $07)/o. Clearly, S* = 
S(0)e“". The distribution of W* under Q, is the distribution of the maximum 
of Brownian motion, and its distribution under Q is obtained as in Example 
10.4 (see (10.38)). Therefore 


Co 


eT BQ(S*) = 77 5(0) f e7” fw- (ujdou, (11.63) 


—co 


where fw«(y) is obtained from (10.38) (see also Exercise 11.15). Lookback 
calls are priced similarly. The price of a Lookback call is given by $(0) times 


o2 (In(K/S(0)) + 22 o r, { n(K/S(0))- SZ \ o 
ee a i » (pee) 
(11.64) 


Barrier Options 


Examples of Barrier Options are down-and-out calls or down-and-in calls. 
There are also corresponding put options. A down-and-out call gets knocked 
out when the price falls below a certain level H, it has the payoff (S(T) — 
K)tI(S, > H), S(0) > H, K > H. A down-and-in call has the payoff 
(S(T) — K)tI(S, < H). To price these options the joint distribution of S(T) 
and S, is needed under Q. For example, the price of down-and-in call is given 
by 


C = e™TEg((S(T)- K)*I(S, < H)) 
S In(H/S(0)) 
= —rT j or 
Tan TN J (50e K')g(x,y)dzdy, 


o 


where g(x, y) is the probability density of (W (T), W+ (T)) under the martingale 
measure Q. It is found by changing measure as described above and in Example 
10.4 (see also Exercise 11.16). Double barrier options have payoffs depending 
on S(T), S, and S*. An example is a call which pays only if the price never 
goes below a certain level Hı or above a certain level Hə with the payoff 


X = (S(T) kh) Tapes < 9* < Hə). 
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The joint distribution of Brownian motion and its minimum and maximum is 
given by Theorem 3.23. By using the change of measure described above, the 
joint distribution of S(T), S,,S* can be found and double barrier options can 
be priced. Calculations are based on the following result. 

Let W,(t) = ut + B(t). Then the probability of hitting barriers a and 
b on the interval [0,¢], given the values at the end points W,,(0) = vo and 
W(t) = T1 


i < < = = 
Plas in Waye mos Wates MEOE Ey, 


does not depend on p, and is given by (e.g. Borodin and Salminen (1996)) 


P(a,b,x,y,t) = opf rh x (11.65) 
= —2x+2n(b—a))? —x+2n(b-—a 2(x — a))? 
> (oft tanba) | crf ¥ + 2n(b— a) +2(6 — 9) \ 


ifa<a<b, a< y< b, and zero otherwise. This formula becomes simpler 
when one of the barriers is infinite, namely if a = —oo (single high barrier) we 
get 


P(—oo, b,x, y,t) =1 -apf -2220 (11.66) 


if x < b, y < b, and zero otherwise. If b = co (single low barrier) we get 


P(a,œ,x,y,t)=1 -opf L0) (11.67) 


ifa < x, a < y, and zero otherwise. 


11.8 Exercises 


Exercise 11.1: (Payoff Functions and Diagrams) 
Graph the following payoffs. 


1. A straddle consists of buying a call and a put with the same exercise 
price and expiration date. 


2. A butterfly spread consists of buying two calls with exercises prices Kı 
and K3 and selling a call with exercise price K2, Kı < Kə < K3. 


Exercise 11.2: (Binomial Pricing Model) 


1. Give the probability space for the Binomial model. 
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2. Show that the stock price can be expressed as S(t+1) = S(t)&141, where 
&:41 is the return in period t+ 1, t = 0,1,...,n — 1, and the variables 
En are independent and identically distributed with values u and d. 


3. Show that the Binomial model does not admit arbitrage if and only if 
d<r<u. 


4. Describe the arbitrage-free probability Q, and show that the discounted 
stock price S+/rt, t= 0,1...,n is a Q-martingale. 
5. Show that this model is complete. 


5. Show that if the price of an option is given by (11.7), then arbitrage 
strategies do not exist. 


Exercise 11.3: Verify that in the model given in Example 11.7 any attainable 
claim has the same price under any of the martingale measures. Give an 
example of an unattainable claim X and show that Eg(X) is different for 
different martingale measures Q. 


Exercise 11.4: Show that if Q is equivalent to P and X > 0, then Ep(X) > 0 
implies Eg(X) > 0, and vice versa. 


Exercise 11.5: (Pricing in incomplete markets) 


1. Show that if M(t), 0 < t < T, is a martingale under two different prob- 
ability measures Q and P, then for s < t Eg(M(t)|Fs) = Ep(M(t)|Fs) 
a.s. If in addition M(0) is non-random, then Ep M(t) = Eg M(t). 


2. Show that the price of an attainable claim X, C(t) = G(H)EQ(X/B(T)|Fz) 
is the same for all martingale measures. 


Exercise 11.6: (Non-completeness in mixed models.) 

In this exercise the price of an asset is modelled as a sum of a diffusion and a 
jump process. Take for example X(t) = W(t) + N(t), with Brownian motion 
W and Poisson process N. Give at least two equivalent probability measures 
Q; and Q,, such that X is a Q;-martingale, i = 1,2 (see Exercise 10.13). 


Exercise 11.7: Give the Radon-Nikodym derivative A in the change to the 
EMM Q in the Black-Scholes model. 


Exercise 11.8: A way to derive a PDE for the option price is based on 
the fact that C(t)e~"! = V(t)e—™ is a Q-local martingale. Obtain the Black- 
Scholes PDE for the price of the option using the Black-Scholes model for the 
stock price. 

Hint: expand d(C(S(t),t)e~"™) and equate the coefficient of dt to zero. 
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Exercise 11.9: Derive the PDE for the price of the option in Heston’s model. 


Exercise 11.10: Show that the expected return on stock under the martin- 
gale probability measure Q is the same as on the riskless asset. This is the 
reason why the martingale measure Q is also called “risk-neutral” probability 
measure. 


Exercise 11.11: Assume S(t) evolves according the Black-Scholes model. 
Show that under the EMM Qı, when S(t) is a numeraire, d(e”’/S(t)) = 
o(e™/S(t))dW:, where W(t) = B(t) — ot is a Q1-Brownian motion. Give 
the Likelihood dQ, /dQ. Give the SDE for S(t) under Q1. 


Exercise 11.12: Derive the Black-Scholes formula by using the stock price 
as the numeraire. 


Exercise 11.13: A call option on an asset in the foreign market pays at 
time T, S(T)(U(T) — K)* in the domestic currency, and its time t price 
C(t) = e-"@?’-YE9, (S(T)(U(T) — K)*|F;). Taking numeraire based on the 
Q,-martingale S(t)e~"¢-"s)* obtain the formula for C(t). 

Exercise 11.14: Let V(t) = a(t)S(t) + b(t)e™ be a portfolio, 0 < t < T. 
Show that it is self-financing if and only if dV(t) = a(t)dS(t) + r(V(t) — 
a(t) S(t)) de. 

Exercise 11.15: Derive the price of a Lookback call. 


Exercise 11.16: Show that the price of a down-and-in call is given by 


ey ee es, M(F/K)+ E) po [PEK - SF 
i (stay) (ro/ oVT Ko oVT )) 


where F = e"T H?/S(0). 


Exercise 11.17: Assume that S(T)/S is does not depend on S, where S(T) 
is the price of stock at T and S = S(0). Let T be the exercise time and K the 
exercise price of the call option. Show that the price of this option satisfies 


the following PDE 
OC Oc 
C = S— + K—. 
as OK 
You may assume all the necessary differentiability. Hence show that the delta 
of the option 2S in the Black-Scholes model is given by ®(h(t)) with h(t) given 


os 
by (11.35). 
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Chapter 12 


Applications in Finance: 
Bonds, Rates and Options 


Money invested for different terms T yield a different return corresponding 
to the rate of interest R(T). This function is called the yield curve, or the 
term structure of interest rates. Every day this curve changes, the time t 
curve denoted by R(t,T). However, the rates are not traded directly, they 
are derived from prices of bonds traded on the bond market. This leads to 
construction of models for bonds and no-arbitrage pricing for bonds and their 
options. We present the main models used in the literature and in applications, 
treating in detail the Merton, Vasicek’s, Heath-Jarrow-Morton (HJM) and 
Brace-Gatarek-Musiela (BGM) models. In our treatment we concentrate on 
the main mathematical techniques used in such models without going into 
details of their calibration. 


12.1 Bonds and the Yield Curve 


A $1 bond with maturity T is a contract that guarantees the holder $1 at T. 
Sometimes bonds also pay a certain amount, called a coupon, during the life of 
the bond, but for the theory it suffices to consider only bonds without coupons 
(zero-coupon bonds). Denote by P(t, T) the price at time t of the bond paying 
$1 at T, P(T,T) =1. The yield to maturity of the bond is defined as 


In P(t, T) 


R(t, T) = Fo 


(12.1) 


and as a function in T, is called the yield curve at time t. Assume also that a 
savings account paying at t instantaneous rate r(t), called the spot (or short) 
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rate, is available. $1 invested until time t will result in 


BG) Heb Oe. (12.2) 


To avoid arbitrage between bonds and savings account, a certain relation must 
hold between bonds and the spot rate. If there were no uncertainty, then to 
avoid arbitrage the following relation must hold 


T 
penar h os (12.3) 


since investing either of these amounts at time t results in $1 at time T. When 
the rate is random, then SE r(s)ds is also random and in the future of t, 
whereas the price P(t, T) is known at time t, and the above relation holds 
only “on average”, equation (12.5) below. 

We assume a probability model with a filtration F = {F;}, 0 < t < T*, 
and adapted processes P(t,T), t < T < T*, and (t). For the extension 
of the no-arbitrage theory see Artzner and Delbaen (1989), Lamberton and 
Lapeyre (1996), Björk (1998), Musiela and Rutkowski (1998), Shiryaev (1999). 
In addition to the number of no-arbitrage concepts in continuous time (see 
Section 11.3) the continuum of bond maturities T makes the market model 
have infinitely many assets and produces further complication. There are 
different approaches, including finite portfolios, where at each time only finitely 
many bonds are allowed, and infinite, measure-valued portfolios. In all of 
the approaches the no-arbitrage condition is formulated with the help of the 
following assumption. 


EMM Assumption 


There is a probability Q (called the equivalent martingale measure), equivalent 
to P (the original “real-world” probability), such that simultaneously for all 
T < T*, the process in t, P(t,T)/G(t) is a martingale ,0<t<T. 

The martingale property implies that 


1 Le LPT _ P(t,T) 
Ee (aml? :) = Ba ( ary | :) = ag a 


where F; denotes the information available from the bond prices up to time t. 
Since P(T,T) = 1, we obtain the expression for the price of the bond 


P(t,T) =Eg (2 | F.) = Eg (< JE reas | Fi) (12.5) 


It shows that the bond can be seen as a derivative on the short rate. 
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12.2 Models Adapted to Brownian Motion 


Here we derive the SDE for bond P(t,T) under Q starting only with the 
EMM assumption. The SDE under P is then derived as a consequence of the 
predictable representation property. As usual, this property points out the 
existence of certain processes, but does not say how to find them. 

Consider a probability space with a P-Brownian motion W(t) and its fil- 
tration F;. Assume that the spot rate process r(t) generates F,, and that the 
bond processes P(t,T), for any T < T*, are adapted. The EMM assumption 
gives the price of the bond by the equation (12.5). The martingale 


P,T - Tr s)ds 

CLIT (« Jo rsa ) (12.6) 
B(t) 

is adapted to the oe filtration. By Theorem 10.19 there exists an 
adapted process X (t = fi o( (t,T)dB(t), where B(t) is a Q-Brownian motion, 
such that 


d (4) = a dX(t) = o(t,T) (=) dB(t). 


BC) b(t) p(t) 
Opening d(P/G), we obtain the SDE for P(t, T) under the EMM Q 
T = r(t)dt + o(t, T)dB(t). (12.7) 


This is the pricing equation for bonds and their options. 
Note that the return on savings account satisfies d3(t)/G(t) = r(t)dt, and the 
return on bond has an extra term with a Brownian motion. This makes bonds 
P(t,T) riskier than the (also random) savings account ((t). 

We find the SDE for the bond under the original probability measure P 
next. Since Q is equivalent to the Wiener measure, by Corollary 10.21 there 
is an adapted process q(t), such that 


W(t) = B(t) +f q(s)ds (12.8) 


oly 
is a P-Brownian motion, with dQ/dP = an goawai J, gide: Substituting 
dB(t) = dW (t) — q(t)dt into SDE (12.7), we obtain the SDE under P 


dP(t,T) 


PET) (r(t) — o(t, T)q(t))dt + o(t, T)dW(t). (12.9) 


Remark 12.1: It follows from equation (12.9) that —q(t) is the excess return 
on the bond above the riskless rate, expressed in standard units; it is known 
as “the market price of risk” or “risk premium”. Most common assumption is 
that q(t) = q is a constant. 
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12.3 Models Based on the Spot Rate 


A model for term structure and pricing can be developed from a model for the 
spot rate. These models are specified under the real world probability measure 
P, and r(t) is assumed to satisfy 


dr(t) = m(r(t))dt + o(r(t))dW(t), (12.10) 


where W(t) is a P-Brownian motion, m and ø are functions of a real variable. 
The bond price P(t, T) satisfies (12.5) 
Fi). 


The expectation is under Q, and the model (12.10) is specified under P. There- 
fore the SDE for r(t) under Q is needed. Note that we could express the above 
expectation in terms of Ep by 

Fi) ’ 


but this expectation seems to be untractable even in simple models. 
We move between P and the EMM Q by using (12.8) expressed as 
dB(t) = dW (t) — q(t)dt. Thus under Q 


T 
P(t,T) = Eo (e J. r(s)ds 


Ep (< H r(s)ds+ f7 a(s)dW(s)—$ i q? (s)ds 


dr(t) = (m(r(t)) + o(r(t))¢(t))dt + o(r(t))dB(t). (12.11) 


The process r(t) is also a diffusion under Q, therefore by Markov property 


T T 
Eg (e. ets) ae Fi) = EQ (e aoi r) f 


The last expression satisfies a PDE by the Feynman-Kac formula (Theorem 
6.8). Fix T, and denote by 
r) =2); 


FE (est) + (mla) + o(@)a() XL (w,1) + Cee, 4) - ecl, t) =0, 


Ox ot 
(12.12) 
with the boundary condition C(x,T) = 1. The price of the bond is obtained 
from this function by 


C(a, t) = Eo (< J r(s)ds 


then by (6.22) it satisfies 


1 


50°(1) 


P(t,T) = C(r(t), t). 


12.4. MERTON’S MODEL AND VASICEK’S MODEL 327 


A similar PDE with suitable boundary conditions holds for options on bonds. 
We list some of the well-known models for the spot rate. 
The Merton model 


dr(t) = pdt + odW (t). (12.13) 

The Vasicek model 
dr(t) = b(a — r(t))dt + odW (t). (12.14) 

The Dothan model 
dr(t) = pr(t)dt + odW (t). (12.15) 

The Cox-Ingersoll-Ross (CIR) model 

dr(t) = b(a — r(t))dt + o\/r()dW(t). (12.16) 

The Ho-Lee model 
dr(t) = u(t)dt + odW(t). (12.17) 


The Black-Derman-Toy model 
dr(t) = y(t)r(t)dt + o(t)dW (t). (12.18) 
The Hull-White model 
dr(t) = b(t)(a(t) — r(t))dt + o(t)dW(t). (12.19) 
The Black-Karasinski model 
dr(t) = r(t)(a(t) — b(t) Inr(t))dt + o(t)r(t)dW (t). (12.20) 


The functions m(r) and o(r) involve parameters that need to be estimated. 
They are chosen in such way that values of bonds and options agree as close 
as possible with the values observed in the market. This process is called 
calibration, and we don’t address it. 

In what follows we derive prices of bonds and their options for some models 
by using probability calculations rather than solving PDEs. 


12.4 Merton’s Model and Vasicek’s Model 


Merton’s Model 
The spot rate in the Merton model satisfies SDE (12.13). Its solution is 


r(t) = ro + ut + oW (t). (12.21) 
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The savings account is given by 
B(t) = mi r(s)ds = erott ut? /2,7 f Wade, 
Assume the constant price for risk q(t) = q, then by (12.11) the SDE for r(t) 
under Q is 
dr(t) = (u + oq)dt + odB(t). 
T 

The price of the bond eT is given by P(t,T) =Eo(e 4: EA, 
Since the conditional expectation given F, is needed, use the decomposition 
of r(s) into an F;-measurable part, and F;-independent part. 
r(s) = r(t)+(u+0q)(s—t)+o(W(s)-W(t)) = r(t)+(u+oq)(s—t)+oW (s—t), 


with W(s—t) independent of F;. Then 


H ame 
PET) = e OT-)-utoT-0/2p (ey W-A8 | 
eT OT) uoa) (T-t)? /2+0°(T—t)*/6. (12.22) 


where we used that the distribution of the random integral TET W(u)du is 
N(0, (T — t)3/3), see Example 3.6. The yield curve is given by 


In P(t, T) 


RUT) =~ 


= r(t) + (ut oq)(P — 1) ~ to? (T ~ 1)? 


Since r(t) has a Normal distribution, so does R(t, T), so that P(t, T) is Log- 
normal. Pricing of a call option on the bond in this model is covered in the 
Exercise 12.2. 

Note that the yields for different maturities differ by a deterministic quan- 
tity. Therefore R(t, T1) and R(t, T2) are perfectly correlated. This is taken as 
a shortcoming of the model. 


Vasicek’s Model 
The spot rate in Vasicek’s model satisfies SDE (12.14). Its solution is 


r= a—eM(a—r(0)) +o f eaw). (12.23) 
0 


We derive the solution below, but note that it is easy to check that (12.23) 
is indeed a solution, see also the Langevin SDE Example 5.6, equation (5.15). 
Writing the SDE (12.14) in the integral form and taking expectations (it easy 
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to see that the It6 integral has zero mean and interchanging the integral and 
the expectation is justified by Fubini’s theorem), we have 


Er(t) — Er(0) = ih b(a — Er(s))ds. (12.24) 
0 
Put h(t) = Er(t). Differentiating, we obtain 
h'(t) = b(a — h(t). 


This equation is solved by separating variables. Integrating from 0 to t, and 
performing the change of variable u = h(s) we obtain 


Er(t) = a — e™™ (a — r(0)). (12.25) 
Let now 
X(t) = r(t) — Er(t) = r(t) — A(t). (12.26) 
X(t), clearly, satisfies 
dX(t) = dr(t) — dh(t) = —bX (t)dt + odW (t), (12.27) 


with the initial condition X(0) = 0. But this is the equation for the Ornstein- 
Uhlenbeck process. By (5.13), X(t) = o fo e (ts) dW(s), and (12.23) follows 
from the equations (12.26) and (12.25). 

Make two observations next. First, the long-term mean is a, 


jim Er(t) =a. 


Second, the process X(t) = r(t) — Er(t) reverts to zero, hence r(t) reverts 
to its mean: if r(t) is above its mean, then the drift is negative, making r(t) 
decrease; and if r(t) is below its mean, then the drift is positive, making r(t) 
increase. Mean reversion is a desirable property in modelling of rates. 


oe 

To proceed to calculation of bond prices P(t,T) = Ege J: riede | Fr), 
further assumptions on the market price of risk q(t) are needed. Assume 
q(t) = q is a constant. 

We move between P and the EMM Q by using (12.8), which states 
dB(t) = dW(t) — qdt. Therefore the equation for r(t) under Q is given by 
(12.14) with a replaced by 

a*=at = (12.28) 

To calculate the Q-conditional distribution of fr r(s)ds given F; needed for 
the bond price, observe that by the Markov property of the solution, for s > t 
the process starts at r(t) and runs for s — t, giving 


s—t 
r(s) = a* — e~ (8-9 (a* — r(t)) + ce P-9) f e"dB(u), (12.29) 
0 
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with a Q-Brownian motion B(u), independent of F;. Thus r(s) is a Gaussian 
process, since the function in the Itô rane is deterministic (see Theorem 
4.11). The conditional distribution of for , T(s)ds given F; is the same as that 
given r(t) and is a Normal distribution. T z n calculation of the bond price 


involves the expectation of a Lognormal, Eg(e ++ i ralde | r(t)). Thus 
P(t, T) = eH t0i/2, (12.30) 


where u and ø? are the conditional mean and variance of f7 r , r(s)ds given 


r(t). Using (12.29) and calculating directly or using (12.4) h = a* — h'/b, we 


have 
T T 
Eg (/ r(s)ds ro) =f Eg (r(s) | r(t)) ds 


= a(T-t)+ L(a’ — r(t))(1 — e 8-9), (12.31) 


= 
II 


To calculate o?, use the representation for r(s) conditional on r(t), equation 


(12.29) 
ere (J roas f rd u) = | f coves odali 


Now it is not hard to calculate, by the formula for the covariance of Itô Gaus- 
sian integrals (4.26) or (4.27), that for s > u 


2 
Cov(r(s),r(u)) = Z (0° — e=?t+s-2) , 


Putting this expression (and a similar one for when s < u) in the double 
integral, we obtain 


2 
o 
oi = ge (FP —t) — 


-a 4e b(T Ue 2b(T 9), 


2 


Thus denoting R(oo) = a* — 57> 


P(E, T) = eb de FP Reo) =r())-(T-)R(0)- a (eT)? 39) 


From the formula (12.32) the yield to maturity is obtained by (12.1). Vasicek 
(1977) obtained the formula (12.32) by solving the PDE (12.12). 

Since the price of bonds is Lognormal with known mean and variance, a 
closed form expression for the price of an option on the bond can be obtained. 
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12.5 Heath-Jarrow-Morton (HJM) Model 


The class of models suggested by Heath, Jarrow, and Morton (1992) is based 
on modelling the forward rates. These rates are implied by bonds with different 
maturities. By definition, forward rates f(t,T), t < T < T* are defined by the 
relation E 

P(t,T) =e de Pewee, (12.33) 


(t, 
Thus the forward rate f(t,T), t 
at time T as seen from time t, 


t < T, is the (continuously compounding) rate 


f(T) =~ 


The spot rate r(t) = f(t,t). Consequently the savings account B(t) grows 
according to 


B(t) = edo 10948, (12.34) 


The assumption of HJM model is that for a fixed T, the forward rate f(t, T) 
is a diffusion in ¢ variable, namely 


df(t, T) = a(t, T)dt + o(t, T)dW(t). (12.35) 


where W (t) is P-Brownian motion and processes a(t, T) and o(t, T) are adapted 
and continuous. a(t, T), o(t,T) and the initial conditions f(0,T), are the pa- 
rameters of the model. 


EMM assumption 


There exists an equivalent martingale probability measure (EMM), Q ~ P 


such that for all T < T*, re on is a Q-martingale. Assuming the existence of 


Q we find equations for the bonds and the rates under Q. 


Bonds and Rates under Q and the No-arbitrage Condition 


The EMM assumption implies that a(t, T) is determined by o(t, T) when SDE 
for forward rates is considered under Q. 


Theorem 12.1 Assume the forward rates satisfy SDE (12.35), the EMM as- 


sumption holds, ie is a Q-martingale, and all the conditions on the coeffi- 


cients of the SDE (12.85) needed for the analysis below. Let 


T(t, T) = n a(t, u)du. (12.36) 
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Then 

a(t, T) = o(t, T)r(t, T). (12.37) 
Moreover, under the EMM Q the forward rates satisfy the SDE with a Q- 
Brownian motion B 


df(t, T) = o(t,T)r(t, T)dt + o(t, T)dB(t). (12.38) 


Conversely, if the forward rates satisfy the SDE (12.38) then SG is a Q- 
local martingale. If the appropriate integrability conditions hold, then it is a 


Q-martingale. 


ProoF: The idea is simple, find d ( PEL O) 2) and equate the coefficient of dt 


to zero. Let 


X(t) = In P(t, T) =- f ft, u)d (12.39) 
By Itô’s formula 
P(t, T)\ _ PET) 1 = 
a( B(t) ) — ple) (axo + 31X X0) (at) l (12.40) 


It is not hard to show, see Example 12.1, that 
= -d e f(t,u yan) =- = —A(t,T)dt — r(t, T)dW (t), (12.41) 


where A(t, T) = —r(t) + SE a(t, u)du. Thus 


3 (0) _ PUT) ((- IEE dt + 57 (ts T)dt = re nawo) . 


B(t) b(t) 
(12.42) 
By Girsanov’s theorem 
ma a(t,u)du 1 
t +—___—__ — -7(t, T) | dt = dB(t 12. 
wos TR 6T) ) d= dB), (1243) 
for a Q-Brownian motion B(t). This gives the SDE for the discounted bond 
under Q 
PeT) PT) 
d =e r(t,T)dB(t). 12.44 
(T a AN gi 


Thus if the model is specified under Q, W(t) = B(t), then 


4 1 
J a(t, u)du = 57 (tT). (12.45) 
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Differentiating in t gives the condition (12.37). The SDE (12.38) follows from 
(12.37). 


Corollary 12.2 The bonds satisfy the following equations under Q fort <T 
dP(t,T) = P(t,T) (r(t)dt — r(t, T)dB(t)), and (12.46) 


P(t,T) = P(0,T)e~ J, t(s.T)4B(s)—3 fi 7?(s,T)ds+ f r(s)ds (12.47) 


PROOF: Equation (12.44) shows that Pt 3G D is the stochastic exponential of 
— fo r( s,T)dB(s). Hence 


POD) =. BOD eT N 
SO = aw f (s, T)dB(5)) 
= PT) .- Ji r(s,7)aB(s)-4 f? 72(s,T)ds 
= OH (12.48) 


Since G(t) = eds "(S)48 the bond’s price is given by me 47). me SDE Me ip 
follows, as the stochastic exponential SDE of — fer s,T)dB(s) + fo r( 


Using (12.47) for Tı and To, we obtain for t < Tı < Tə by eliminating M r(s)ds 


Corollary 12.3 A relation between bonds with different maturities is given by 


P T; z T(s —T(s s)-s i (8 —T“(s s 
P(t, Tə) = P oemet E E Bey T, 
(12.49) 


Remark 12.2: 

1. Equation (12.37) is known as the no-arbitrage or EMM condition. 

2. The effect of the change to the EMM Q is in the change of drift in the SDE 
for the forward rates, from (12.35) to (12.38). 

3. The volatility of the bond is T(t, T) = JE a(t, s)ds, the integrated forward 
volatilities by (12.46). 

4. The expression (12.47) for P(t, T) includes the Itô integral i T(s,T)dB(s), 


which is not observed directly. It can be obtained from f 2 f(t u)du using 
(12.38) and interchanging integrals. Integrated processes and interchanging 
integrals can be justified rigorously, see Heath et al. (1992), and in greater 
generality Hamza et al. (2002). 

5. The vector case of W(t) and o(t,T) in (12.35) gives similar formulae by 
using notation o(t, T)dW (t) = oan cilt, T)dW;(T), the scalar product and 
replace o? by the norm |a|?, see Exercise 5.9. 
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Example 12.1: Differentiation of an integral of the form fr f(t, s)ds. We show 


that (12.41) holds. Introduce G(u,t) = J f(t,s)ds. We are interested in dG(t, t); 
it is found by 


O o 
dG(t, t) = (50u, t) + acts ‘)) dt, 
or 
T T 
d (/ ftv) = -seoa f df (t, v). 
t t 
Now, = r(t) and, using the model for df(t, v) we obtain 


(t,t) 
d (/ feodo) = —r(t)dt + (/ a(t, odo) dt + (| ottu)de) dW (t). 


Exchange of the integrals is justified in Exercise 12.3. 


Example 12.2: (Ho-Lee model) 
Consider the SDE for forward rates (12.38) with the simplest choice of constant 


volatilities o(t,T) = o=const. Then 7(t, T) = JE o(t,u)du = o(T — t). Thus 
df(t, T) = o° (T — t)dt + odB(t), 


F(T) = f0, T) + PUT — 5) +oBW), and 
£2 
r(t) = f(t, t) = fO,t) +o" > + BU). 
They contain the Brownian motion, which is not observed directly. Eliminating it, 


oB(t) = r(t) — f(0,t) —o?4, we have 


99 


f,T) =r(t) + 0°t(T — t) + f(0,T) — f(0,t). 


This equation shows that forward rates f(t,T1) — f(t, T2) = f(0, Tı) — f(0, T2) + 
ot(T;—T2) differ by a deterministic term, therefore they are also perfectly correlated. 
r(t) and f(t, T) are also perfectly correlated. This seems to contradict observations 
in financial markets. 

The bond in terms of the forward rates (12.33) 


“Ee, a 2 oy 
panse d f(t,u)du zaaf, f(0,u)du—o J, t(u £)du oB(t)(T tg 


Using — fs f(0, u)du = In P(0, T) — In P(0, t), 


P(0,T 
P(t,T) = Tai me otT(T—t)/2-o B(t)(T-t) | 


Eliminating B(t), we obtain the equation for the bond in terms of the spot rate, from 
which the yield curve is determined 


P(t,T) = TL (T-t)r(t)-0?t(T—t)?/2+(T—t)f (0,t) 
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The graphs below illustrate possible curves produced by the Ho-Lee model. 


Forward rates 30 days apart Yield curves 30 days apart 
0.20 
ee ve Pee 
0.10 L n- ——== 
0.05 
0.00 + 
0 1 2 years 0 1 2 3 
years 
Figure 12.1: Forward curves and Yield curves. 
Bond prices P(t,T) 
1 = 
0.9 + 
0.8 | 
0.7 M 
years 
0.6 T T T T T T 
0 0.5 1 1.5 2 2.5 3 


Figure 12.2: Bonds P(t, T;) for two maturities T) and T> one year apart. 
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12.6 Forward Measures. Bond as a Numeraire 


Options on a Bond 


An attainable claim is one that can be replicated by a self-financing portfolio 
of bonds. The predictable representation property gives the arbitrage-free 
price at time t of an attainable European claim with payoff X at maturity S, 
t<S<T, 


C(t) = Eg (Fax | Fr) =Eg(e JP rau y | F:). (12.50) 


For example, the price at time t of a European put with maturity S and strike 
K on a bond with maturity T (the right to sell the bond at time S for K) is 
given by 

B(t) 


C(t) = Eola PT) | Fi). (12.51) 


This option is in effect a cap on the interest rate over [S, T] (see Section 12.7). 


Forward Measures 


Taking the bond, rather than the savings account, as a numeraire allows us to 
simplify option pricing formulae. 


Definition 12.4 The measure Qr, called the T-forward measure, is obtained 
when the T-bond is a numeraire, i.e. B(t)/P(t,T), t < T is a Qp-martingale. 


Theorem 12.5 The forward measure Qr is defined by 


dQr 1 

— = ———___.. 12.52 
dQ ~ POTO) A 
The price of an attainable claim X at time t under different numeraire is 
related by the formula, 


A(T) = 


C(t) = Ba( PY x | Fà) = P(t, T)Eg,(X | Fi). (12.53) 


B(T) 
PROOF: follows directly from the change of numeraire Theorem 11.17. By the 
EMM assumption P(t,T)/G(t), t < T, is a positive Q-martingale. Qr := Q4, 


oed : , aoi ; 
with a =A(T)= een E POTTY which is (12.52). By equation 
(11.49) 


C(t) = Bol EX | Fx) = Ege 


P(t,T) 
P(T,T) 


X | Fy) = P(t, T)Eg,(X | Fi). 
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Remark 12.3: Eg,(X|Fi) = F(t) is called the forward price at t for the date 
T of an attainable claim X. These prices are Q,--martingales, see Doob-Levy 
martingale, Theorem 7.9. 


Forward Measure in HJM 
Theorem 12.6 The process 
dB" (t) = dB(t) + T(t, T)dt (12.54) 


is a Qp-Brownian motion. Moreover, the the forward rates f(t,T) and the 
bond P(t,T) satisfy the following SDEs under Qr 


df(t, T) = a(t, T)dB’ (t). (12.55) 
dP(t,T) = P(t,T) ((r(t) +7°(t,T))dt — T(t, T)dBT (t)) . (12.56) 


PROOF: The T-forward martingale measure Qr is obtained by using A(T) = 
1 
PUTT): Then 


Therefore by the Girsanov’s theorem dB? (t) = dB(t) + 7(t,T)dt is a Qr- 
Brownian motion. Under Q the forward rates satisfy the SDE (12.38) 


df(t,T) = o(t,T)r(t, T)dt + o(t, T)dB(t). 
The SDE under Qy is obtained by replacing B(t) with BT (t), 
df(t, T) = a(t, T) (T(t, T)dt + dB(t)) = o(t, T)dBT (t). 


The SDE under Qpr for the bond prices is obtained similarly from the SDE 
(12.46) under Q. 


The above result shows that under the forward measure f(t, T) is a Q,-local 
martingale. It also shows that r(t, T) is the volatility of P(t, T) under the 
forward measure Qr. 


Distributions of the Bond in HJM with Deterministic Volatilities 


The following result, needed for option pricing, gives the conditional distribu- 
tion of P(T,T + ô). As a corollary distributions of P(t, T) are obtained. 
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Theorem 12.7 Suppose that the forward rates satisfy the HJM model with 
a deterministic o(t, T). Then for any 0 < t < T and any 6 > 0, the Q- 
conditional distribution of P(T,T +6) given F, is Lognormal with mean 
P(t,T+6) 1 [7 
Eg(In P(T, T + ô)| F) = In Sea — sf (77(s,T + 6) —77(s,T))ds, 
and variance 
T 
y(t) = Var(In P(T, T + ô)| Fi) = f (t(s, T +4) —1(s,T))*ds. (12.57) 
t 

Proor: By letting t = Tı = T and To = T + ô in equation (12.49) , the 
expression for P(T, T + 6) is obtained, 


nis T 
P(T,T +8) = P(0,T +ô) - Jp (r(s,P+8)—1(s,T))4B(s)—$ fy (T?(s:T+5)—7°(s,T))ds. 
P(0,T) 
(12.58) 
To find its conditional distribution given F+, separate the F;-measurable term, 


P(0,T +6) — fÉ (7(s,P46)—1(s,7))dB(s)—4 f" (7?(s,7+6)—1?(s,T))ds 
P(T,T +6) = = l 2 i 
( b T ô) P(0, T) 9 s 
a J @(s,0+8)-7(8,T))dB(s)—4 f7 (7?(s,7+8)—7?(s,T))ds 
Shere 5) o- JEC, T+5)-1(s,T))dB(s)=4 f7 (1?(s,7+8)—7?(s,T))ds 
P(t, T) l 
If 7(t, T) is non-random, then the exponential term is independent of F;. Hence 


the desired conditional distribution given F, is Lognormal with the mean and 
variance as stated. 


(12.59) 


Corollary 12.8 The conditional distribution of P(T,T +6) given F, under 
the forward measure Qp,5 is Lognormal with mean 

P(t,T+6) 1 [T 
EQr4s (In P(T, T + ôF) — ln PET) + al (t(s, T + ô) _ T(S, T))*ds, 
and variance y7(t) (12.57). 


PROOF: Use equation (12.59) for the conditional representation, together 
with (12.54), a Q7,5-Brownian motion dB7+°(t) = dB(t) + r(t,T + 6)dt, to 
have 


e 


P(t, T +ô =. TT ira —T(s T+8(g 1 (7 (r(s, —T(s 2ds 
P(T, T+ô)= = A SE (Ets P+8)—7(8,2))aBT+9(s) 44 f7 (r(s,T+8)=1(s,T))?ds 


(12.60) 


By taking t = T = 0, T + ô = T in Theorem 12.7 and its corollary we obtain 
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Corollary 12.9 In the case of constant volatilities the Q-distribution and Qr- 
distribution of P(t,T) are Lognormal. 


The Lognormality of P(t,T) can also be obtained directly from equations 
(12.47) and (12.38). 


12.7 Options, Caps and Floors 


We give common options on interest rates and show how they relate to options 
on bonds. 


Cap and Caplets 


A cap is a contract that gives its holder the right to pay the rate of interest 
smaller of the two, the floating rate, and rate k, specified in the contract. A 
party holding the cap will never pay rate exceeding k, the rate of payment is 
capped at k. Since the payments are done at a sequence of payments dates 
Tı, Tə,..., Tn, called a tenor, with T;41 = Ti + ô (eg. 6 = + of a year), the 
rate is capped over intervals of time of length ô. Thus a cap is a collection of 


caplets. 


Figure 12.3: Payment dates and simple rates. 


Consider a caplet over [T,T + 6]. Without the caplet, the holder of a loan 
must pay at time T + ô an interest payment of fd, where f is the floating, 
simple rate over the interval [T,T + 6]. If f > k, then a caplet allows the 
holder to pay kd. Thus the caplet is worth fô — kô at time T + ô. If f < k, 
then the caplet is worthless. Therefore, the caplet’s worth to the holder is 
(f —k)*6. In other words, a caplet pays to its holder the amount (f — k)t6 
at time T + ô. Therefore a caplet is a call option on the rate f, and its price 
at time t, as any other option, is given by the expected discounted payoff at 
maturity under the EMM Q, 


b(t) 


Caplet(t) = Eg foes 


(f —k)+5 | F.) i (12.61) 
To evaluate this expectation we need to know the distribution of f under Q. 
One way to find the distribution of the simple floating rate is to relate it to 
the bond over the same time interval. By definition, POTIS = 1+ fô. This 
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relation is justified as the amounts obtained at time T + ô when $1 invested at 
time T in the bond and in the investment account with a simple rate f. Thus 


1 1 
f=5 ee = 1) (12.62) 


This is a basic relation between the rates that appear in option contracts and 
bonds. It gives the distribution of f in terms of that of P(T,T + ô). 


Caplet as a Put Option on the Bond 


We show next that a caplet is in effect a put option on the bond. From the 
basic relation (EMM) P(T,T + ô) = Egy | Fr). Proceeding from (12.61) 
by the law of double expectation, with E = Eg 


E BBT) 1 
Caplet(t) = EEO O POTTI —1-—k6)* | Fr) | Fẹ) 
_ B(t) 1 G(T) 
- (ary rar a T TA | Fr) | Fe) 
E B(t) 1 
= (1+ KE Tory Ga Re — P(T,T+6))"|F). (12.63) 


Thus a caplet is a put option on P(T, T+6) with strike cy: and exercise time 
T. In practical modelling, as in the HJM model with deterministic volatilities, 
the distribution of P(T,T + 6) is Lognormal, giving rise to the Black-Scholes 
type formula for a caplet, Black’s (1976) formula. 

The price of caplet is easier to evaluate by using a forward measure (Theo- 
rem 12.5). Take P(t, 7+06) as a numeraire, which corresponds to T+ 6-forward 
measure Qs, 


Caplet(t) = P(t,T+5)Egz,5((f —k)t6 | Fi) (12.64) 


= P(t,T +ô)EQr,s | —1-—kô)* | Fi) ; 


Caplet Pricing in HJM model 


The price of a caplet, which caps the rate at k over the time interval [T, T + 6], 
at time t is given by equation (12.63) under the EMM Q, and by (12.64) 
under the forward EMM Qr4s. These can be evaluated in closed form when 
the volatilities are non-random. 

For evaluating the expectation in the caplet formula (12.64), note that if X 
is Lognormal eN mo’), then 1/X = eN Cno?) is also Lognormal. This allows 
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us to price the caplet by doing standard calculations for E(X — K)* giving 
Black’s caplet formula. 
The price of a Cap is then found by using forward measures 


Cap(t) = $ Caplet,(t) = | P(t, Ti) Ban, (fir = k)tS | Fe). (12.65) 


i=l 


The Cap pricing formula is given in Exercise 12.6. 

A floor is a contract that gives its holder the right to receive the larger of 
the two rates, the rate specified in the contract k, and the floating simple rate 
f. Floors are priced similarly to Caps with floorlets being (k — f;-1)*. 


12.8 Brace-Gatarek-Musiela (BGM) Model 


In financial markets Black-Scholes like formulae are used for everything: bonds, 
rates, etc. To make the practice consistent with the theory, Brace, Gatarek 
and Musiela introduced a class of models, which can be seen as a subclass of 
HJM models, where instead of the forward rates f(t, T), the LIBOR rates are 
modelled (Brace et al.(1997), Musiela and Rutkowski (1998)). In BGM models 
the rates are Lognormal under forward measures, the fact that implies option 
pricing formulae consistent with market use. 


LIBOR 


The time t forward 6-LIBOR (London Inter-Bank Offer Rate) is the simple 
rate of interest on [T,T + ô] 


1, P(t,T) 


L(t,T) = EET =a), (12.66) 


E 
note that L(T,T) = f is the rate that gets capped, see (12.62). 


Theorem 12.10 The SDE for L(t, T) under the forward Qr} measure, when 
P(t, T + ô) is the numeraire, is 


1+ L(t, T)ô 


dL(t,T) = L(t, T) |H 


) (T(t, T +5) — T(t, T))dBTtË (t). (12.67) 


Proor: By Corollary 12.3 to equation (12.49), 


P(t,T) = P(0,T) e7 Sp (t(8.2)-1(8,P+8))dB(s)—$ f° (Cs T)=7°(s:T+8))ds 


P(t, T +6) P(0,T +ô) í j 
12.68 
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dBT+?(t) = dB(t) + 7(t,T + d)dt, is a Qp45-Brownian motion (12.54), giving 


P(t,T) efh T(s,T)—=T(s, T+ô))dBTH(s)—4 f? (T(s,T)—7T(s, T+6))?d s 
P@MT+S) ie TTR i 
(12.69) 
POP) ý ris 


Using the stochastic exponential equation, it follows that 


PT) \_/_ PET) 
4 (ers 5) = (ses) (T(t, T + ô) — r(t, T))ABT? (t). (12.70) 


Finally, using the definition of L(t, T), the SDE (12.67) is obtained. 


Choose now the HJM volatility o(t, s), such that y(t, T) is deterministic 


1+ L(t,T)6 
n= (EDE 


s) 


(12.71) 


) Ct, T+8)- r(t,T)) = HE [ot s)as, 


Corollary 12.11 Let y(t,T) be deterministic such that ie y*(s,T)ds < œ. 
Then L(t,T) fort <T, solves the SDE 


dL(t,T) = L(t, T)y(t, T)dB™t*(t), (12.72) 
with the initial condition L(0,T) = (ey 1). Thus L(t,T) is Lognormal 
under the forward measure Qr+5, moreover it is a martingale. 


L(T,T) = L(t, Teh VODAT O- [PPT Ids, (12.73) 
and the eee distribution of L(T,T) given F; is Lognormal with mean 
InL(t,T)— 5 Lp? ?(s,T)ds and variance fv 77(s,T)ds. 


We now prove that the choice of forward volatilities specified above is possible. 


ELT 12.12 Let y(t,T), t < T be given and such that the Ité integral 
fo (s,T)dB(s) is defined. Then there exist forward rates volatilities o(t,T), 


such that the integrated volatility Jt o(s,u)du is determined uniquely, and 
(12.72) holds. 


PROOF: (12.72) is equivalent to 


L(t, T) = L(0,T)E ([ TyaBT**(s)) o ear (12.74) 
0 
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By the definition of L(t, T), and (12.69) we have 
1 > P(t,T) j 

ô P(t,T + - 

1 


= T Pe — e(fe (6,7 +8) ~1(s,7))4B7 (3) -1). 


Equating this to (12.74) we obtain the equation for stochastic exponentials 


Het 


E (fos T +6) —7(s, T))aB™**(s)) = (1-0) E (f y(s, TyaBT**(s)) +c, 
(12.75) 


with c= P, Now, using the stochastic logarithm (Theorem 5.3) 


T (J asua) dBT+*(s)=C£ (a — o)E ( f anano) + e) ; 


(12.76) 


from which the integrated volatility fo. ne o(s,u)du and a suitable process 
o(t,T) can be found (see Exercise 12.7). 


Caplet in BGM 


Caplet is a Call on LIBOR, it pays 6(L(T,T) —k)* at T +ô. By (12.53) its 
price at time t 


C(t) = P(t,T + 6)Bg,,5(6(L(T, T) — k)t|F). (12.77) 


Using that L(T,T) is Lognormal under the forward measure Q7,5 (12.73), 
the caplet is priced by Black’s caplet formula (agrees with the market) by 
evaluating under the forward measure. 


C(t) = P(t, T + 6) (L(t, T)N (h1) — kN (h2)), aes 
hia I LET) + 1 yr ze Tds 


SDEs for Forward LIBOR. under Different Measures 


Consider now a sequence of dates To, Ti, . . . , Zn, and denote by Qr, the forward 
measure corresponding to the numeraire P(t, Tk). Corollary 12.11 states that 
L(t, Tk—1) for t < Ty_1 satisfies the SDE 


dL(t, Tk—1) E y(t, Tk—ı)L(t, Tk-1)dB™ (t), (12.79) 
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where B7*(t) is a Q7,-Brownian motion. 
An SDE for all of the rates under a single measure is sometimes required 
(for a swaption), it is given by 


Theorem 12.13 For a given i, the SDE for the forward LIBOR L(t, Tk—1) 
on [Th-1, Tk] under the forward measure Qr,, is given by (12.79) fori =k, by 
(12.80) fori > k, and (12.81) i< k. 


ie ACR í 


dL(t, T1) = L(t, Tk- JS 
j=k 


y(t, Th—1) L(t, Tr-1)dB™ (t). (12.80) 


k-1 

y(t, Te t, T; ) L(t, T; 
deta) = L Tia) Z ee 
TaI 


j=i 


+ y(t, Th—1) L(t, Th-1)dB” (t). (12.81) 


dt 


ProoF: We establish the relationship between different forward measures 
as well as corresponding Brownian motions. By (12.54) the Brownian motion 
under the forward measure Qr, satisfies 


dB (t) = dB(t) + T(t, Ty)dt. (12.82) 
Hence, using this with k — 1, we obtain that B7* and B™*-: are related by 
dB™ (t) = dBT*-1(t) + (T(t, Tk) — T(t, Ty_1) dt. 
By (12.71), from the choice of y, 
L(t, Tr-1)6 
14+ L(t, Ty-1)6° 
giving the relationship for the SDEs for LIBOR 


T(t, Tk) = T(t, Tk—1) => y(t, Tr-1) 


L(t, Tk—1)ô 
1+ L(t, Tk-1)ô 


Now fix i > k. Using the above relation iteratively from 7 to k, we obtain 


dB (t) = dB™-1(t) + y(t, Tr_1) dt. 


L(t,T;)6 


E ICT LLET (12.83) 


dB™ (t) = dB™ (t) + Er (t, T;) 
Replace B7* in the SDE (12.79) for fon under Qr, , 
dL(t,Tk—1) = y(t, Tk-1)L(t, Ty-1)dB® (t), by BT: with the drift from (12.83) 
to obtain (12.80). The case i < k is proved similarly. 


Another proof can be done by using the result on the SDE under a new nu- 
meraire, Theorem 11.18, with 6(t) = P(t, Tp) and S(t) = P(t, Tj). 
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Choice of Bond Volatilities 


We comment briefly on the choice of bond volatilities. For a sequence of dates 
equation (12.71) implies from the choice of y, 


Lit, Tk—1)ô 


t, Tk) — T(t, Tk-1) = y(t, Tk-1) —— A. 12.84 
T(t, k) T(t, k 1) y(t, k DISET) ( ) 
Therefore the bond volatilities T(t, Tk) can be obtained recursively, 
k-1 
L(t, Te-1)8 L(t, T;)ô 
t. Ty) = T(t, Tp t, T,-1) —————— = t, T} : 
T(t, k) T(t, k 1) + q(t, k VISE Tea) 21h DTT LET) 
(12.85) 


In practice y(t, T) is taken to be a function of T only (Rebonato, Brace), for 
example q(t, T) = (a+ be"). 


12.9 Swaps and Swaptions 


A basic instrument in the interest rate market is the payer swap, in which the 
floating rate is swapped in arrears (at T;) against a fixed rate k at n intervals 
of length ô = T; — Tj_-1, i = 1,2...,n. The other party in the swap enters 
a receiver swap, in which a fixed rate is swapped against the floating. By a 
swap we shall mean only a payer swap. A swap value at time t, by the basic 
relation (12.50) is given by 


L / B® 
t,T>,k) =Eg N S ( SA n =k) | Fi). 
Swap( L0; ) Q 2 ô p (L(T; 1 1) ) t 
This can be written by using forward measures (Theorem 12.5) 


Swap(t, To, k) = D9 P(t, T;)EQr, (Wt, Day i)| Fi) 


i=l 


= 53° PETNLG T1) k), (1286) 


as under the forward measure Qr,, L(t, T;—1) is a martingale (Corollary 12.11). 

A swap agreement is worthless at initiation. The forward swap rate is that 
fixed rate of interest which makes a swap worthless. Namely, the swap rate at 
time t for the date To solves, by definition, Swap(t, To, k) = 0. Hence 


Pe EART) 


(12.87) 
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Thus the t-value of the swap in terms of this rate is 


Swap(t, To, k) = ô (5 P(t, m) (k(t, To) — k). (12.88) 


Other expressions for the Swap are given in Exercises 12.10 and 12.10. 

A payer swaption maturing (with exercise) at T = Tọ gives its holder the 
right to enter a payer swap at time Tọ. A swaption, Swaption(t, To), delivers 
at time To a swap, Swap(To, To), when the value of that swap is positive. This 
shows the value of the swaption at maturity is (Swap(To, To))*. Thus its value 
at time t < To is 


Swaption(t,T) = Eg ( Ble) (Swap(To,To))* |e) (12.89) 


l 
œ% 
w 
© 
T 
afao 
WS 


Consider taking )>;"_, P(t,T;) as the numeraire instead of G(t). The process 
X; P(t, T:)/G(t) is a Q-martingale, as a sum of martingales. By Theorem 
11.17, the new measure Ôr, defined by A(T) = Deis TH Mies POD) as 
its Radon-Nikodym derivative with respect to Q gives (a call on the swap rate) 


Swaption(t, To) = ô (>: Plu) Eo Eon (Œ(To, To) -k Wo . (12.90) 
i=1 
The distribution of the swap rate under the swap-rate measure Ân is approx- 
imately Lognormal. Differentiating, simplifying and approximating k(t, To) in 
(12.87) leads to the SDE for the swap rate 

dk(t, To) = 6(t, To)k(t, To) dB, 


where Br, is a Q7,-Brownian motion, and 


G(t, To) = Swat T) (eo a 
pa 1 POTS L(0, Ti- 1) 


The expression for the swaption (12.90) integrates to the Black Swaption For- 
mula as used in the market. 

Another way to evaluate a swaption is by simulation. For details on analytic 
approximations and simulations see Brace et al. (1997), Brace (1998), Musiela 
and Rutkowski (1998). 
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Remark 12.4: We have presented a brief mathematical treatment of models 
for rates and bonds based on diffusion processes. There is a large amount of 
literature on models based on processes with jumps. For jumps in the spot 
rate see Borovkov et al. (2003) and references therein. HJM with jumps 
were studied by Shirakawa (1991), and more generally by Björk, Kabanov 
and Runggaldier (BKR) (1996) and Bjork, Di Masi, Kabanov and Rung- 
galdier (BDMKR) (1997). Kennedy (1994) considered a Gaussian Markov 
field model. HJM and BDMKR models can be criticized for being an infinite- 
dimensional diffusion driven by a finite number of independent noise processes. 
Cont (1998) suggests modelling the forward curves by an infinite-dimensional 
diffusion driven by an infinite-dimensional Brownian motion. This approach 
is included in Random Fields models, such as Brownian and Poisson sheet 
models, also known as models with space-time noise. The most general model 
that allows for existence of EMM is given in Hamza et al. (2002); it includes 
Gaussian and Poisson random field models. 


12.10 Exercises 


Exercise 12.1: Show that a European call option on the T-bond is given by 
C(t) E P(t, T)Qr(P(s,T) >K | Fi) a K P(t, s)Q,(P(s,T) >K | Fi), where 
s is the exercise time of the option and Q,, Qr are s and T-forward measures. 


Exercise 12.2: Show that a European call option on the bond in the Merton 
model is given by 
In pp + 
P(t, no(t 2? 
(aJe o(T —s)/s—t 
Exercise 12.3: (Stochastic Fubini Theorem) 
Let H(t, s) be continuous 0 < t,s < T, and for any fixed s, H(t, s) as a process 
in t, 0 < t < T is adapted to the Brownian motion ele F. Assume 
fie H?(t, s)dt < co, so that for each s the Itô 2 X(s =H (t, s)dW (t) 


is defined. Since H(t, s) is continuous, Y (t =H (t, s) 7 is defined and it is 
continuous and adapted. Assume 


fe (me a ds < œ. 


1/2 
1. Show that JE E|X(s)|ds < i (£ fo H?’ (t, s)dt) ds < œ, consequently 


ie X(s)ds exists. 


al eC iz PUD = alee Ce 
2K P(t) 0 (— NN; 
) (ts) ( o(T —s)/s—t ) 
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2. If0=to < tı <...<t, =T isa partition of [0, T], show that 


[ (Eres ti, 8) (W (ti+1) — we) ds = y: (/ H(t sds) (W (ti+1)-W (t:)). 
i=0 


3. By taking the limits as the partition shrinks, show that 


f i X(s)ds = f . Y (t)dW (t) 


in other words the order of integration can be interchanged 


[ ([iaeoave jam f (f mesi wo. 


Exercise 12.4: (One factor HJM) Show that the correlation between the for- 
ward rates f(t, Tı) and f(t, T2) in the HJM model with deterministic volatili- 
ties o(t, T) is given by 

h o(s,T1)o(s, T2)ds 


p(T, T2) = Corr(f(t, Tı), f(t, T2)) = a ne eer 
Jo 07(8, Ti)ds J, o7(s, T2)ds 


Give the class of volatilities for which the correlation is one. 


Exercise 12.5: Find the forward rates f(t,T) in Vasicek’s model. Give the 
price of a cap. 


Exercise 12.6: (Caps Pricing Market Formula) 
Show that in the HJM model with a deterministic o(t,T) the price of a cap 
with trading dates T; = T + iĝ, i =1,...,n, and strike rate k is given by 


aes (t, Ty-1)®(—hi_1(t)) — 1 + KO) P(t, T;) ®(—hi-1(t) — vi-1(¢))), 
where y? (t) = Var(In P(T;-1,7;)) = a l7(s,T;) — 7(s, Ti-1)|?ds, with 


T ô Ti 
r(t,T) = ff o(t,s)ds and hi-i(t) = Lp (In HOPET) by? (t)) 


Exercise 12.7: Let 0 < c < 1, and y(t) a bounded deterministic function 
be given. Show that there is a process (s), such that 


HOE f -v(s)dB(s)) = E( I 8(s)4B(s)) 


Hence deduce the existence of forward rates volatilities o(t,T) in HJM from 
specification of the forward LIBOR volatilities y(t, T) in BGM. 


12.10. EXERCISES 349 


Exercise 12.8: (Two factor and higher HJM Models) 
Two-factor HJM is given by the SDE 


df(t, T) = a(t, T)dt + o1(t, T)dW1(t) + o2(t, T)dWa(t), 
where W, and W> are independent Brownian motions. 


1. Give the stochastic differential equation for the log of the bond prices, and 
show that 


t t t 
= P(0,T)e" Ji Av@uDdu- fi nenawm -fi Talt, T)AWa (u) | 


with 7;(t,T) = JE cilt, s)ds,, i = 1,2, and A(t, T) =f a (t, s)ds. 


2. Using the same proof as in the one- factor model, mi a the no-arbitrage 
condition is given by 


T T 
a(t, T) =atT) | ox(t, s)ds-+oa(¢.7) | o2(t, s)ds. 


Exercise 12.9: Show that a swap can be written as 
Swap(t, To, k) = P(t, To) — )— BY. PI P(t,T;). 


Exercise 12.10: Denote by b(t) = 
Swap(t, To, k) = Pt, To) — P(t, t, Th Jiz 
P(t, 


To) — P(t, Tn) 
b(t) l 


ô X; P(t, T;). Show that for 0 < t < To 
kb(t ), and that the swap rate 


k(t) = 


Exercise 12.11: (Jamshidian (1996) Swaptions E Forganla) 
Assume that the swap rate k(t) > 0, and that v? = fE OL k, k](s) is 
deterministic. Show that 


Swaption(t) = b(t) (a+ (t)k(t) — ka_(#)), 


where 
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Chapter 13 


Applications in Biology 


In this chapter applications of stochastic calculus to population models are 
given. Diffusion models, Markov Jump process models, age-dependent non- 
Markov models and stochastic models for competition of species are presented. 
Diffusion Models are used in various areas of Biology as models for population 
growth and genetic evolution. Birth and Death processes are random walks 
in continuous time and are met in various applications. A novel approach 
to the age-dependent branching (Bellman-Harris) process is given by treating 
it as a simple measure-valued process. The stochastic model for interacting 
populations that generalizes the Lotka- Volterra prey-predator model is treated 
by using a semimartingale representation. It is possible to formulate these 
models as stochastic differential or integral equations. We demonstrate how 
results on stochastic differential equations and martingales presented in earlier 
chapters are applied for their analysis. 


13.1 Feller’s Branching Diffusion 


A simple branching process is a model in which individuals reproduce inde- 
pendently of each other and of the history of the process. The continuous 
approximation to branching process is the branching diffusion. It is given by 
the stochastic differential equation for the population size X (t), 0 < X(t) < œ, 


dX(t) = aX(t)dt + oy X (t)dB(t). (13.1) 
In this model the infinitesimal drift and variance are proportional to the pop- 
ulation size. The corresponding forward (Kolmogorov or Fokker-Plank) equa- 
tion for the probability density of X (t) is 


Op(t,x) __ deplt,e) | o? duplt,<) 


(13.2) 


Ot Ox 2 Ox? 
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Analysis of the process was done by solving the partial differential equation 
(13.2) (Feller (1951)). Here we demonstrate the stochastic calculus approach 
by obtaining the information directly from the stochastic equation (13.1). First 
we prove that a solution of (13.1) is always non-negative, moreover once it hits 
0 it stays at 0 forever. 


Theorem 13.1 Let X(t) solve (13.1) and X(0) > 0. Then 
P(X(t) > 0 for allt > 0) = 1, moreover if rT = inf{t : X(t) = 0}, then 
X(t) =0 for allt >r. 


PROOF: Consider first (13.1) with X(0) = 0. Clearly, X(t) = 0 is a solution. 
Conditions of the Yamada-Watanabe result on the existence and uniqueness 
are satisfied, see Theorem 5.5. Therefore solution is unique, and X(t) = 0 is 
the only solution. Consider now (13.1) with X(0) > 0. The first hitting time 
of zero 7 is a stopping time with X(T) = 0. By the strong Markov property 
X(t) for t > 7 is the solution to (13.1) with zero initial condition, hence it is 
zero for all t > r. Thus if X(0) > 0, the process is positive for all t < 7, and 
zero for all t > 7. 


The next result describes the exponential growth of the population. 


Theorem 13.2 Let X(t) solve (13.1) and X(0) > 0. Then EX(t) = X (0)e*. 
X (t)e~“ is a non-negative martingale which converges almost surely to a non- 
degenerate limit as t > ov. 


PROOF: First we show that X(t) is integrable. Since Ité integrals are local 
martingales, fo \/X(s)dB(s) is a local martingale. Let Tn be a localizing 


sequence, so that Een ,/ X(s)dB(s) is a martingale in t for any fixed n. 


Then using (13.1) we can write 
tATn t^ATn 
X(t ^ Ta) = x0) +a f X(sjdsto | V X(s)dB(s). (13.3) 
0 0 
Taking expectation, we obtain 
t^Tn 
EX(tA Tn) = X(0) + oE f X(s)ds. (13.4) 
0 


Since X is non-negative, Mog X(s)ds is increasing to h X(s)ds. Therefore 


0 
by monotone convergence, 


thTn t 
ef X(s)ds > e f X(s)ds, 
0 0 
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as n — oo. Therefore limn—oo EX (t A Th) = X(0) + ERX s)ds. Since 
X(t A T,) — X(t) we obtain by Fatou’s lemma (noting that if a ae exists, 
then lim inf = lim) 


EX(t) = E lim X(t^T,) = Eliminf X(t A Tp) 
< liminfEX (tA T,,) = Jim EX (tA Tp) x(o) +E f X( X(s 


Using Gronwall’s inequality (Theorem 1.20) we obtain that the expectation is 
finite, 
EX(t) < (1+ X(0))e™. (13.5) 


Now we can show that the local martingale te yv X(s)dB(s) is a true martin- 
gale. Consider its quadratic variation 


Bl | VXUB() | VOB = B f x4) )ds < Ce™. 


Thus by Theorem 7.35 IM \/ X(s)dB(s) is a martingale. Now we can take ex- 
pectations in (13.1). Differentiating with respect to t and solving the resulting 
equation, we find EX (t) = X(0)e. 

To prove the second statement, use integration by parts for X(t)e~°* to 


obtain 
U(t) = X(t)e7%* = XO +o f e es /X(s)dB(s 
and U(t) is a local martingale. 
t t 
E[U, U](t) = ef e 28 X(s\ds = f e “ds < œ, 
0 0 
and E[U, U](oo) < oo. Therefore U(t) is a martingale, moreover, it is square 


integrable on [0,co). Since it is uniformly integrable, it converges to a non- 
degenerate limit. 


The next result uses diffusion theory to find the probability of extinction. 
Theorem 13.3 Let X(t) solve (13.1) and X(0) = x > 0. Then the probability 
of ultimate extinction is e~ 22* ifa>0, and 1ifa<0 


PROOF: Let To = T = inf{t: X(t) = 0} be the first time of hitting zero, and 
T, the first time of hitting b > 0. By the formula (6.52) on exit probabilities 


P2(Tp < Th) = (13.6) 


S( 
Se) — S(0)’ 
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where S(x) is the scale function, see (6.50). In this example 


S(x) =| ree oY du, (13.7) 


1 


where xo and zı are some positive constants. Simplifying (13.7) we obtain 


from (13.6) 
cb 


—cx 


e SeT 
Pz(To < Tp) = ar ae 


with c = 2a/o7. The probability of ultimate extinction is obtained by taking 
limit as b — oo in (13.8), that is, 


(13.8) 


2a 
aga 
P < Too) = im Po(To < Ts) = { : a. ; (13.9) 


where Tə = limp_.o. Tp is the explosion time, which is infinite if the explosion 
does not occur. 


Branching diffusion is related to a stopped Brownian motion with drift by a 
change of time. See Example 7.17. 


13.2 Wright-Fisher Diffusion 


In population dynamics frequencies of genes or alleles are studied. It is as- 
sumed for simplicity that the population size N is fixed and individuals are of 
two types: A and a. If individuals of type A mutate to type a with the rate 
71/N and individuals of type a mutate to type A with the rate y2/N, then 
it is possible to approximate the frequency of type A individuals X(t) by the 
Wright-Fisher diffusion, given by the stochastic equation 


dX(t) =(— X(t) +721- X(t) dt + VX — X()dB(t), (13.10) 


with 0 < X(0) < 1. For complete description of the process its behaviour at 
the boundaries 0 and 1 should also be specified. When X(t) = 0 or 1, then all 
the individuals at time t are of the same type. Consider first the case of no 
mutation: 7, = y2 = 0. Then the equation for X (t) is 


dX(t) = VXU — X()dB(), (13.11) 


with 0 < X(0) = x < 1. The scale function is given by S(x) = x, consequently 
forO<a<a<b<l 
r-a 


P2(Tp < Ta) = 7> 


(13.12) 
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The process is stopped when it reaches 0 or 1, because at that time and at 
all future times all the individuals are of the same type. This phenomenon 
is called fixation. The probability that fixation at A occurs having started 
with proportion x of type A individuals is x, and with the complementary 
probability fixation at a occurs. 

The expected time to fixation is found by Theorem 6.16 as the solution 
to Lv = —1 with boundary conditions v(0) = v(1) = 0, and Lv = zay, 
Solving this equation (using f lns = sln g — 2), we obtain that the expected 
time to fixation having started with proportion x of type A individuals is given 
by 

v(x) = Ert = —2((1 — x)ln(1 — x) + zlnz). (13.13) 

In the model with one-way mutation when, for example yz = 0, %1 = 7, 

X(t) satisfies 


dX (t) = —yX(t)dt + / X(t)(1 — X(t))dB(t), (13.14) 
with 0 < X(0) < 1. The process is stopped once it reaches the boundaries 0 or 
1. In this case the scale function is given by S(x) = (1—(1—2)!~?7) /(1— 27) 
if y # 1/2 and S(x) = —log(1 — x) when y = 1/2. Note that by continuity of 
paths T, | Tı as b Î 1, and it is easy to see that if y > 1/2 then 


Px (Ti < To) = lim Pa (Th < To) = 0. (13.15) 


It is possible to see by Theorem 6.16 that the expected time to fixation is finite. 
Thus fixation at type a occurs with probability 1. If y < 1/2 then expected 
time to fixation is finite, but there is a positive probability that fixation at 
type A also occurs. 

In the model with two-way mutation both 71,72 > 0. Analysis of this 
model is done by using the diffusion processes techniques described in Chapter 
6, but it is too involved to be given here in detail. The important feature of this 
model is that fixation does not occur and X(t) admits a stationary distribution. 
Stationary distributions can be found by formula (6.69). We find 


C ” 2u(y) 2y1—1,.2y2—1 
n(x) = —— exp (/ dy) = C(1 — a) at?” (13.16) 
a? (x) zo 7 (Y) 
which is the density of a Beta distribution. C = ['(2y1)T'(2y2)/T(2y1 + 272). 
Diffusion models find frequent use in population genetics, see for exam- 
ple, Ewens (1979). For more information on Wright-Fisher diffusion see, for 
example, Karlin and Taylor (1981), Chapter 15. 


Remark 13.1: The Theory of weak convergence is used to show that a diffu- 
sion approximation to alleles frequencies is valid. This theory is not presented 
here but can be found in many texts, see for example Liptser and Shiryayev 
(1989), Ethier and Kurtz (1986). 
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13.3 Birth-Death Processes 


A Birth-Death process is a model of a developing in time population of par- 
ticles. Each particle lives for a random length of time at the end of which it 
splits into two particles or dies. If there are x particles in the population, then 
the lifespan of a particle has an exponential distribution with parameter a(x), 
the split into two occurs with probability p(x) and with the complimentary 
probability 1 — p(x) a particle dies producing no offspring. 

Denote by X (t) the population size at time t. The change in the population 
occurs only at the death of a particle, and from state x the process jumps to 
x+ 1 if a split occurs or to x — 1 if a particle dies without splitting. Thus the 
jump variables €(x) take values 1 and —1 with probabilities p(x) and 1 — p(x) 
respectively. Using the fact that the minimum of exponentially distributed 
random variables is exponentially distributed, we can see that the process 
stays at x for an exponentially distributed length of time with parameter 
A(x) = za(x). 

Usually, the Birth-Death process is described in terms of birth and death 
rates; in a population of size x, a particle is born at rate b(x) and dies at the 
rate d(x). These refer to the infinitesimal probabilities of population increasing 
and decreasing by one, namely for an integer x 


P(X(t4 b(x) + o(ô), 
P(X(t+ 6) =x-1|X(t)=2) = d(x)d+0(5), 


= 
Il 
8 
+ 
K 
II 
| 


and with the complimentary probability no births or deaths happen in (t, t+ ô) 
and the process remains unchanged 


P(X (t+ 6) = z| X(t) = x) = 1 — (b(x) + d(x))6 + o(ô). 


Here o(ô) denotes a function that lims_,9 o(8)/ = 0. Once population reaches 
zero it stays there forever. 


P(X(t+ 8) = 0|X(t) =0) =1. 


It can be seen that these assumptions lead to the model of a Jump Markov 
process with the holding time parameter at x 


A(x) = b(x) + d(x), (13.17) 
and the jump from x 


+1 with probability ,~@) — 
(x) = b(x)-+d(x) 


13.18 
—1 with probability OES IG} l 
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The first two moments of the jumps are easily computed to be 


b(x) — d(x) 
b(x) + d(x) 


) 
) 

By Theorem 9.15 the compensator of X(t) is given in terms of 
( 


A(x)m(x) = b(x) — d(x), 


which is the survival rate in the population. Thus the Birth-Death process 
X(t) can be written as 


x 


m(x) = and v(x) = 1. (13.19) 


x 


( 
( 


X(t) = X(0) + | (b(X(s)) — d(X(s)))ds + M(t), (13.20) 


where M(t) is a local martingale with the sharp bracket 


(M, M) (t) = | (W(X (s)) + d(X(s)))ds. (13.21) 


Since E|€(x)| = 1, it follows by Theorem 9.16 that if the linear growth condi- 
tion 
b(a) + d(x) < C(1 +2), (13.22) 


holds then the local martingale M in representation (13.20) is a square inte- 
grable martingale. 

Sometimes it is convenient to describe the model in terms of the individual 
birth and death rates. In a population of size x, each particle is born at the rate 
B(x) and dies at the rate y(x). The individual rates relate to the population 
rates by 

b(x) = xB(a), and d(x) = zy(x). (13.23) 


Introduce the individual survival rate a(x) = B(x)— y(x). Then the stochastic 
equation (13.20) becomes 


X(t) = X(0) +f a(X(s))X(s)ds + M(t). (13.24) 


Birth-Death Processes with Linear Rates 


Suppose that the individual birth and death rates 8 and y are constants. This 
corresponds to linear rates b(x) = Gx, d(x) = yx (see (13.23)). The linear 
growth condition 13.22 is satisfied, and we have from equation (13.20) 


X(t) =X(0)+a [ X(s)ds + M(t), (13.25) 
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where a =  — y is the survival rate and M(t) is a martingale. Moreover, by 
(13.21) 


(M, M) (t) = ern f Keds. (13.26) 


Taking expectations in (13.25) 
t 
EX(t) = X(0) +o f EX(s)ds, (13.27) 
0 


and solving this equation, we find 
EX(t) = Xoe™. (13.28) 


Using integration by parts (note that e®’ has zero covariation with X(t)), we 
have 
d(X(He*") = ed X(t) — ae™®™ X (t—)dt. (13.29) 


Since X (t—)dt can be replaced by X (t)dt, by using the equation (13.25) in its 
differential form we obtain 


d(xX(e~*") = ed X(t) — ae™®* X (t)dt = e7% dM (t). (13.30) 


Thus ; 
X(t)e"* = x(o)+ f e “*dM(s). (13.31) 
0 


Using the rule for the sharp bracket of the integral and equation (13.26), we 


find 
( i “e-%*dM(s), Í ea dM(s)) (t) 


I 


fe e 75d (M, M) (s) 


II 


t 
(B+) f eX (sds 
0 
Since EX (t) = X(0)e*’, by taking expectations it follows that 
B( f ceva), | caM(s)) (co) < œ. 
0 0 
Consequently X(t)e~™ is a square integrable martingale on [0, 00). 
i pay 
Var(X(t)e"™) =E (f earam (s) ) (t)= OTN —e7®%). (13.32) 
0 


Since X(t)e~® is square integrable on (0,00), it converges almost surely as 
t — œ to a non-degenerate limit. 
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Processes with Stabilizing Reproduction 


Consider the case when the rates stabilize as population size increases, namely 
B(x) > B and q(x) — y as x > œ. Then, clearly, a(x) = G(x) — y(x) > 
a = P — y. Depending on the value of a radically distinct modes of behaviour 
occur. If a < 0 then the process dies out with probability one. If a > 0 
then the process tends to infinity with a positive probability. If the rate of 
convergence of a(x) to a is fast enough, then exponential growth persists as in 
the classical case. Let ¢(x) = a(x)—a. Then under some technical conditions, 
the following condition is necessary and sufficient for exponential growth, 


| BD i be (13.33) 
1 x 

If (13.33) holds then X(t)e~™ converges almost surely and in the mean square 
to a non-degenerate limit. 

The case a = 0 provides examples of slowly growing populations, such as 
those with a linear rate of growth. Let a(x) > 0 and a(x) | 0. Among such 
processes there are some that become extinct with probability one, but there 
are others that have a positive probability of growing to infinity. Consider the 
case when a(x) = c/a when x > 0. It is possible to show that if c > 1/2 then 
q= P(X(t) — œ) > 0. Stochastic equation (13.24) becomes 


X(t) = X(0) + ef I(X(s) > 0)ds + M(t). (13.34) 


By taking expectations in (13.34) one can see that limy.. EX(t)/t = cq, 
and the mean of such processes grows linearly. By using Theorem 9.19 and 
Exercise 9.5 it is possible to show that for any k, E(X*(t))/t® converges to the 
k-th moment of a gamma distribution times q. This implies that on the set 
{X(t) — œ}, X(t)/t converges in distribution to a gamma distribution. In 
other words, such processes grow linearly, and not exponentially. By changing 
the rate of convergence of a(x) other polynomial growth rates of the population 
can be obtained. 

Similar results are available for population-dependent Markov Branching 
processes that generalize Birth-Death processes by allowing a random number 
of offspring at the end of a particle’s life. For details of the stochastic equation 
approach to Markov Jump processes see Klebaner (1994), where such processes 
were studied as randomly perturbed linear differential equations; Hamza and 
Klebaner (1995). 
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13.4 Branching Processes 


In this Section we look at a population model known as the age-dependent 
Bellman-Harris Branching process, see Harris (1963), Athreya and Ney (1972). 
This model generalizes the Birth and Death process model in two respects: 
firstly the lifespan of individuals need not have the exponential distribution, 
and secondly, more than one particle can be born. In the age-dependent model 
a particle lives for a random length of time with distribution G, and at the 
end of its life leaves a random number of children Y, with mean m = EY. 
All particles reproduce and die independently of each other. Denote by Z(t) 
the number of particles alive at time t. It can be seen that unless the lifetime 
distribution is exponential, the process Z(t) is not Markovian, and its analysis 
is usually done by using Renewal Theory (Harris (1963)). Here we apply 
stochastic calculus to obtain a limiting result for the population size counted 
in a special way (by reproductive value). 

Consider a collection of individuals with ages (a',...,a*) = A. It is conve- 
nient to look at the vector of ages A = (a',...,a*) as a counting measure A, 
defined by A(B) = >7_, 1g(aŻ), for any Borel set B in IR*. For a function f 
on R the following notations are used 


A)= | F@)A -5 fla’). 


The population size process in this notation is Z(t) = (1, A+). In this approach 
the process of ages A; is a measure-valued process, although simpler than 
studied e.g. in Dawson (1993). To convert a measure-valued process into a 
scalar valued, test functions are used. Test functions used on the space of 
measures oe of the form F((f,4)), where F and f are functions on R. Let 


h(a) = = ye be rate of dying at age a. E with and without the subscript 
A denotes the expectation when the processes starts with individuals aged 
(a1,...,a”) = A. The following result is obtained by direct calculations. 


Theorem 13.4 For a bounded differentiable function F on R and a con- 
tinuously differentiable function f on IRt, the following limit exists 


lim “Ba{ F((f, Ae) — F((f,4))} = GF (U, A), (13.35) 


F(f,A)) = FUP Af A) (13.36) 
+ J na HEA (FOFO + (f,4) — f(@))) — FFA) 


i=l 
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The operator G in (13.36) defines a generator of a measure-valued branching 
process in which the movement of the particles is deterministic, namely shift. 
The following result gives the integral equation (SDE) for the age process 
under the test functions. 


Theorem 13.5 For a bounded C! function F on R and a C! function f on 
Rt 
t 
FRAD = FAD) + | GFU Ads + MES, (08.87 
0 


where MEF is a local martingale with the sharp bracket given by 


(ME, MFH), = i GF? ((f, A.) )ds—2 l F((f,As))GF((f, As))ds. (13.38) 


Consequently, 


BAMEN? = Ea( | GFG, As))ds-2 | FU, A)GF(U, A.))ds); 


provided E4(M/")? exists. 


ProoF: The first statement is obtained by Dynkin’s formula. Expression 
(13.38) is obtained by letting U+ = F((f, Az)) and an application of the fol- 
lowing result. 


Theorem 13.6 Let U, be a locally square-integrable semi-martingale such that 
U, = Uo + Ar + Mi, where A; is a predictable process of locally finite variation 
and M; is a locally square-integrable local martingale, Ag = Mo = 0. Let U? = 


U? + Bi + Ni, where B, is a predictable process and N; is a local martingale. 
Then 


(M, M), = Bi — ae Us-dAs — X_ (As — Ag_)?. (13.39) 
0 


s<t 


Of course, if A is continuous (as in our applications) the last term in the above 
formula (18.89) vanishes. 


PROOF: By the definition of the quadratic variation process (or integration 
by parts) 


$ t t 
U? =u3+2 | U,_dU,+[U, U]: =u3+2 | U,-dA,+2 | U,_dM,+[U, U]. 
0 0 0 
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Using the representation for U? given in the conditions of the lemma, we 
obtain that [U, U]: — By + 2f U,_dA, is a local martingale. Since [A, A]; = 
Dect(As — As—)?, the result follows. 


Let v? = E(Y?) be the second moment of the offspring distribution. Ap- 
plying Dynkin’s formula to the function F(u) = u (and writing M/ for M1), 
we obtain the following 


Theorem 13.7 For aC! function f on RY 


t 
(FA) = (f,40) + | (Lf, Asds+ MY, (13.40) 

0 

where the linear operator L is defined by 
Lf = f' —hf + mhf(0), (13.41) 


and Mf is a local square integrable martingale with the sharp bracket given by 
t 
(Mf, M*) = f (f?(0)v?2h + hf? — 2f(0)mhf, As)ds. (13.42) 


PROOF: The first statement is Dynkin’s formula for F(u) = u. This function 
is unbounded and the standard formula cannot be applied directly. However 
it can be applied by taking smooth bounded functions that agree with u on 
bounded intervals, F„(u) = u for u < n, and the sequence of stopping times 
Tn = inf{(f, At) > n} as a localizing sequence. The form of the operator L 
follows from (13.36). Similarly (13.42) follows from (13.38) by taking F(u) = 


u?. 


By taking f to be a constant, f(u) = 1, Theorem 13.7 yields the following 
corollary for the population size at time t, Z(t) = (1, A). 


Corollary 13.8 The compensator of Z(t) is given by (m — 1) Jó (h, As)ds. 


It is useful to be able to take expectations in Dynkin’s formula. The next 
result gives a sufficient condition. 


Theorem 13.9 Let f > 0 be a C! function on R® that satisfies 
(Lf, A)| < C + (f, A)), (13.43) 


for some constant C and any A, and assume that (f, Ao) is integrable. Then 
(f, At) and Mf in (18.87) are also integrable with EM/ =0. 
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PROOF: Let Tn be a localizing sequence, then from (13.37) 
tT n 
(F, Atar,) = (f,40) + | (Lf, As)ds + MÉ r, (13.44) 
0 
where Mfr, is a martingale. Taking expectations we have 


tATn 
E(f, Atar, ) = BUA) +E | (Lf, As)ds. (13.45) 


By Condition (13.43), 


IA 


ATi tAaTn 
IE J (Lf, As)ds| < E f \(Lf, As)|ds 
0 0 


IA 


tATn 
ct+ce | (f, As)ds. 
0 


Thus we have from (13.45) 
tATn 
E(f Aian) < E(f,Ao) +Ct+CE f (f, As)ds 
0 


< E(f,Ao)+Ct+ ce [16 <T,)(f, As)ds 


0 
$ 
< E(f, Ao) +c f E(f, Asar, ds. 
0 


It now follows by Gronwall’s inequality (Theorem 1.20) that 
E(f, Atat,) < E(f, Ao) + Ct + te“ < 00. (13.46) 


Taking n — oo, we conclude that E(f, Az) < co by Fatou’s lemma. Thus 
(f, A+) is integrable, as it is non-negative. Now by Condition (13.43) 


ei f (Lf, Adsl < | EIES, A)ids < | C(1 +E(f, As))ds < o0. (13.47) 


It follows from (13.47) that h (Lf, As)ds and its variation process h (Lf, As)|ds 
are both integrable, and from (13.37) that 


Mj = (A) —(f,40) — | (Lf, As)ds (13.48) 
0 


is integrable with zero mean. 


For simplicity we assume that G(u) < 1 for all u€ IR*. Equation (13.40) can 
be analyzed through the eigenvalue problem for the operator L. 
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Theorem 13.10 Let L be the operator in (13.41). Then the equation 
Lq=rq (13.49) 


has a solution q, for any r. The corresponding eigenfunction (normed so that 
q(0) = 1) is given by 
me’ 


(4) = aay (1 i K e-"dG(s)). (13.50) 


Proof: Since eigenfunctions are determined up to a multiplicative constant, 
we can take g(0) = 1. Equation (13.49) is a first order linear differential 
equation, and solving it we obtain the solution (13.50). 


Theorem 13.11 Let q, be a postive eigenfunction of L corresponding to the 
eigenvalue r. Then Q,(t) = e™"™* (qr, At) is a positive martingale. 


PROOF: Using (13.40) and the fact that qr is an eigenfunction for L, we have 


t 
(a A1) = (der Ao) +r | (ar, As)ds + Mf, (13.51) 
0 


where M;!" is a local martingale. The functions qr clearly satisfy condition 
(13.43). Therefore (qr, Az) is integrable, and it follows from (13.51) by taking 
expectations that 

E(qr, At) = e™*E(qr, Ao). (13.52) 


Using integration by parts for e~™(q,, Ay), we obtain from (13.51) that 
dQ,(t) = d(e™™ (qr, At)) =e "dM", 
and 


t 
Qr(t) = (qr, Ao) +f e '*dMr (13.53) 


is a local martingale as an integral with respect to the local martingale M4. 
Since a positive local martingale is a supermartingale, and Q, (t) > 0, Q- (t) 
is a super-martingale. But from (13.52) it follows that Q,(t) has a constant 
mean. Thus the supermartingale Q,(t) is a martingale. 


The Malthusian parameter a is defined as the value of r which satisfies 
m | e '“dG(u) = 1. (13.54) 
0 


We assume that the Malthusian parameter a exists and is positive, in this 
case the process is called supercritical. 
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Theorem 13.12 There is only one bounded positive eigenfunction V, the 
reproductive value function, corresponding to the eigenvalue a which is the 
Malthusian parameter, 


Vier a | 7 6-284G(s). (13.55) 


PROOF: It follows that for r > a, m fY e~"™"dG(u) < 1 and the eigenfunction 
qr in (13.50) is positive and grows exponentially fast or faster. For r < a, the 
eigenfunction qr takes negative values. When r = a, q% = V in (13.55). To 
see that it is bounded by m, replace e~°* by its largest value e~°”. 


This is the main result 


Theorem 13.13 W; = e~“(V, A+) is a positive square integrable martingale, 
and therefore converges almost surely and in L? to the non-degenerate limit 
W >0, EW = (V, Ao) > 0 and P(W > 0) > 0. 


PrRooFr: That W; is a martingale follows by Theorem 13.11. It is positive 
therefore it converges almost surely to a limit W > 0 by the martingale con- 
vergence Theorem 7.11. To show that the limit is not identically zero, we 
show that convergence is also in L?, i.e. the second moments (and the first 
moments) converge, implying that EW = limi... EW; = EWo = (V, Ao). 

It follows from (13.42) that 


t 
(MY, MY), =| ((0? + (m = V)?)h, As) ds. (13.56) 
0 
By (13.53) 
t 
W, = (V, Ao) +f e “dM, (13.57) 
0 
and we obtain that 


(W,W), = A etd (MY, MY) = T gag +(m—V)?*)h, A,) ds. 


i i (13.58) 

Now it is easy to check that there is a constant C and r, a < r < 2a, such 
that 

((0? + (m = V)?)h, As) < Clar, As); (13.59) 


and using Theorem 13.11, 


ef See (Ca +(m— V)?)h, Ay ds < cf el"-20)8 < oo. (13.60) 
0 0 
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This implies from (13.58) that E(W,W),, < oo. Therefore W; is a square 
integrable martingale (see Theorem 8.27) and the result follows. 


The martingale {W,} is given in Harris (1963). The convergence of the pop- 
ulation numbers Z(t)e~™ is obtained from that of W; by using a Tauberian 
theorem. For details and extensions of the model to the population-dependent 
case see Jagers and Klebaner (2000). 


13.5 Stochastic Lotka- Volterra Model 


Deterministic Lotka-Volterra system 


The Lotka-Volterra system of ordinary differential equations, (Lotka (1925), 
Volterra (1926)) 


ti = ax, — bariy 
Ut = CTY — dy, (13.61) 


with positive £o, yo and positive parameters a, b, c, d describes a behaviour 
of a prey-predator system in terms of the prey and predator “intensities” £+ 
and y;. Here, a is the rate of increase of prey in the absence of predator, d is 
a rate of decrease of predator in the absence of prey while the rate of decrease 
in prey is proportional to the number of predators by;, and similarly the rate 
of increase in predator is proportional to the number of prey cx. The system 
(13.61) is one of the simplest non-linear systems. 

Since the population numbers are discrete, a description of the predator- 
prey model in terms of continuous intensities £+, y¢ is based implicitly on a 
natural assumption that the numbers of both populations are large, and the 
intensities are obtained by a normalization of population numbers by a large 
parameter K. Thus (13.61) is an approximation, an asymptotic description of 
the interaction between the predator and the prey. Although this model may 
capture some essential elements in that interaction, it is not suitable to answer 
questions of extinction of populations, as the extinction never occurs in the 
deterministic model, see Figure 13.5 for the pair x+, ys in the phase plane. 

We introduce here a probabilistic model which has as its limit the deter- 
ministic Lotka-Volterra model, evolves in continuous time according to the 
same local interactions and allows us to evaluate asymptotically the time for 
extinction of prey species. 

There is a vast amount of literature on the Lotka- Volterra model and a his- 
tory of research on stochastic perturbations of this system exact, approximate 
and numerical. 
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The system (13.61) possesses the first integral which is a closed orbit in 
the first quadrant of phase plane x,y. It is given by 


r(x, y) = cx — dlog x + by — alog y + ro, (13.62) 


where ro is an arbitrary constant. It depends only on the initial point (£o, yo), 
see Figure below. 


a 


01 02 03 0.4 05 


Figure 13.1: First integral r(x, y), vo = 0.3, yo = 3,a = 5,b = 1,c = 5,d = 1. 


Stochastic Lotka-Volterra system 


Let X, and Y, be numbers of prey and predators at time t. We start with 
simple equations for prey-predator populations 


Xt = Xo+ m — m 
¥=Yotm—7 (13.63) 


where 7} is the number of prey born up to time t, my is the number of prey 
killed up to time t, 7) is the number of predators born up to time t, T is the 
number of predators that have died up to time t. 


We assume that 7}, T, , Ti, T; are Poisson processes with the following state- 


dependent random rates aX;, Dixy: E XiY,, dY; respectively and disjoint 
jumps (the latter assumption reflects the fact that in a short time interval 
(t,t + ôt) only one prey might be born and only one might be killed, only one 
predator might be born and only one might die, with the above-mentioned 
intensities. Moreover all these events are disjoint in time). 

Assume Xo = Kap and Yo = K zo for some fixed positive xo, yo and a large 
integer parameter K. Introduce the prey and predator populations normalized 
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by K 


x Y, 
af =F, and yf = 7. 


It terms of 2 and yf the introduced intensities for Poisson processes can be 
written as aKz¥, bKak yk, cKak yk, dKy*. 
Existence 


In this section, we show that the random process (X+, Y;) is well defined by 
equations (13.63). To this end, let us introduce four independent sequences of 
Poisson processes 


nè = (02 (1), 2 (2),...), 
m>/* = (0P 1), 0 (2), ..), 
iy = ae Oya.) 


nd = (m9 (1), 14(2),...). 


Each of them is a sequence of i.i.d. Poisson processes with rates a, b; £, d 
respectively. Define the processes (X+, Y;) by the system of Itô equations 
X= xo+ f Y 1(Xs— > n)di? (n -f Y (Xs-Y,- > n)dIP/E (n), 
n>1 n>1 
Y, = vot f Dx _Y,_ > n)dIS/* (n -f5 (Y¥,_ > ndt (n), 
n>1 n>1 
(13.64) 


governed by these Poisson processes, which obviously has a unique solution 
until the time of explosion, on the time interval [0, Too), where 


Too = inf{t > 0 : Xi V Yı = oo}. 


The Poisson processes with state-dependent rates in (13.63) are obtained as 
follows 
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It is easy to see that 7}, n: , T, T, have the required properties. Their jumps 


are of size one. Since all Poisson pre Se are independent, their jumps are 


disjoint, so that the jumps of 74, T}, 7, T, are disjoint as well. To describe their 


intensities introduce a stochastic basis (Q, F, F = (Fi)i>0,P), the filtration 
F is generated by all Poisson processes and satisfies the general conditions. 
Then, obviously, the random process 


A, = [ou (X, > n)ads 


n>1 


is adapted and continuous, hence predictable. Using ik f(s—)ds = h f(s)ds 


—A,= [xr I(X,— > n)dll®(n y- fr (X,_ > n)ads 


n>1 n>1 


=| XOX- 2n) d (IÈ (n) — ads) . 


n>1 


Thus 7; — Aj is an integral with respect to a martingale, hence is a local 
martingale. Therefore A) is the compensator of 7;. Since X, is an integer- 


valued random variable, X` [(X, > n) = Xs, giving that Aj = fe aX,ds and 
n>1 
the intensity of 7; is aX;, as claimed. Analogously, other compensators are 


th Sy tg A t 
Al = f —X.Y.ds, A, = | —X,Y.,ds, A/= | dY,ds, 
o K o K 0 
and thus all other intensities have the required form. 
We now show that the process (X+, Y+)t>0 does not explode. 


Theorem 13.14 
P(Tx = oo) =1. 


PROOF: Set TŽ = inf{t > 0: X; > n}, n > 1 and denote by TX = lim TX 
Using (13.64) we obtain 


t 
EXiarx < x+ f aEXsarxds. 
0 


By Gronwall’s inequality, EXrxar < Xoe®" for every T > 0. Hence by the 
Fatou’s lemma EXrxar < Xoe®". Consequently, for all T > 0 


PGS < T)=0. 
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Set TY = inf{t: Y, > £}, £ > 1 and denote by TX = jim TY. Using 
(13.64) we obtain 


t 
c 
EYarxary < Yo+ f T ECXsaTXATY Yearxary) ds 


< Yo Te E preys ATX ATY ds. 


Hence, by Gronwall’s inequality, for every T > 0, EY¥pyreqry, < YoeC"® and 


by the Fatou’s lemma, 
EYparx ary < Yoek"?, n> 1. 


Consequently, P(TX < TX AT) =0,V T > 0,n > 1 and, since TX 7 ov, as 
n — œ, we obtain 


0 
Since T = TX ^ TX, P(T% < T) SPO eT) + Pa < T), and we have 
P(T < T) = 0 for any T > 0. 


Corollary 13.15 For T,, = inf{t: Xi V Y; > n}, and for all T >0 
lim P(T, < T) =0. 


The above description of the model allows us to claim that (X+, Y;) is a 
continuous-time pure jump Markov process with jumps of two possible sizes 
in both coordinates: “1” and “—1” and infinitesimal transition probabilities 
(as dt — 0) 


Xesot = Xe + 1| X, Y; ) = aX,0t + o(ôt) 


( 
(Xu PES SA Y, ) = P yyt + 0(6t) 
( 


P 
P(¥iso = Ye + 1[Xt, Yr) = SALVidt + 0( 61) 
P 


(Yast =Y; —1|Xs, Yı) = dY;,5t + o(ôt). 


Semimartingale decomposition for (x,y) 


Let A’, Al’, Al, A” be the compensators of 71, 7)’,7,,7// defined above. Intro- 
duce martingales 


fi T 1 no n Nn IS U KP on qu 
Mi =m,- An Mp =m — Ai, Mi =T — Ap My =T — Ap, 
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and also normalized martingales 


/ 
x __ Mi-M 


aK M'—M", 
m; K 


d = ——. 13.65 

and Mi K ( ) 
Then, from (13.64) it follows that the process (x¥ , y) admits the semimartin- 
gale decomposition 


t 
ok = aot i lack — bok y¥|ds + m¥, 
0 
T 
yÉ =yo+ | fer y® — dyX]ds+ mk, (13.66) 
0 


which is a stochastic analogue (in integral form) of the equations (13.61). 

In the sequel we need quadratic variations of the martingales in (13.66). 
By Theorem 9.3 all martingales are locally square integrable and possess the 
predictable quadratic variations 


(M', M'), = Al, (M",M"), = AY and (W, M’), = 4, (M",M"), = AY, 
(13.67) 
and zero covariations (M’,M"), = 0,..., (M’,M"), = 0, since the jumps of 
Ti, Ty, Ty, Ty are disjoint. Hence we obtain the sharp brackets of the martin- 
gales in equations (13.66), 


(M*A h 


0, 
K K 1 i K K,K 
(m m dt = K (ax? + bz; Ys )ds, 
0 


t 
(ME, Mh = F (cxf y® + dy®)ds. (13.68) 
0 


Note that the stochastic equations above do not satisfy the linear growth 
condition in eX, y*. Nevertheless, the solution exists for all t. 


Deterministic (Fluid) approximation 


We now show that the Lotka-Volterra equations (13.61) describe a limit (also 
known as fluid approximation) for the family (x*, y‘) as parameter K — oo. 
Results on the fluid approximation for discontinuous Markov processes can be 
found in Kurtz (1981). 


Theorem 13.16 For any T > 0 and7>0 


lim P (sup (laf — zil + yk — vl) >n) =0. 
K-00 t<T 
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PROOF: Set 
TË = inf{t: zë Vy >n}. (13.69) 


By Corollary 13.15, since TX = TaK, 


lim limsup P(T < T) = 0. (13.70) 


n> K 00 


Hence, it suffices to show that for every n > 1, 


lim P( sup (|ë — zil + ly — yl) > n) =0. (13.71) 
K> NESTE AT 


Since sup (#* vy) < n+ 1, there is a constant Ln, depending on n and 
t<TK AT 
T, such that for t < TE AT 
| (axi* — batty) — (axı — briys)| < Ln (left — zil + [ys — wel) 
(eae y — dye) — (exeye = dye)| < En (|2 = ae] + |y — vel). 
These inequalities and (13.61), (13.66) imply 
IETKAT — 2px at| + lyr wr = YTKAT| 


t 
< 22n | (leerx = TsATK | F years = Usa K |) ds 
0 


+ sup |mě|+ sup [QE]. 
t<TE AT t<TK AT 


Now, by Gronwall’s inequality we find 


sup (lek -a+ lyk — yl) < eT sup (mi |+ sup _|f'l). 
t<TE AT t<TE AT t<TE AT 


Therefore (13.71) holds if both sup |m|and sup |M] converge in 
t<TK AT t<TK AT 


probability to zero as K — oo. By (13.68) and definition (13.69) of T*, 


E(m*) pK ar < (a(n +1)+b(n+1)?)T. 


Thus by Doob’s inequality for martingales (8.47) 


E ( sup |m¥ ») < 4P ((m¥, m“ )rxar) > 0. 


t<TKAT 
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as K — oo. This implies sup;<rrar |mf| > 0 as K — oo. The second term 
sup,<r« ar |M | is treated similarly, and the proof is complete. 


Using the stochastic Lotka-Volterra model one can evaluate asymptotically, 
when K is large, the time to extinction of prey species, as well as the likely 
trajectory to extinction. One such trajectory is in the Figure below. This anal- 
ysis is done by using the Large Deviations Theory, see Klebaner and Liptser 
(2001). 


Figure 13.2: A likely path to extinction. 


13.6 Exercises 


Exercise 13.1: Find the expected time to extinction ETọ in the Branching 
diffusion with a < 0. Hint: use Theorem 6.16 and formula (6.98) to find 
E(To ATi). ETo = limp. E(To \ Ty) by monotone convergence. 


Exercise 13.2: Let X(t) be a branching diffusion satisfying SDE (13.1). 
1. Let c = 2a/c7. Show that e~°** is a martingale. 


2. Let T be the time to extinction, T = inf{t : X4 = 0}, where T = œ if 
X, > 0 for all t > 0, and let q(x) = P(T < t) be the probability that 
extinction occurs by time t when the initial population size is x. Prove 
that q(x) <e7°. 


Exercise 13.3: A model for population growth is given by the following SDE 
dX(t) = 2X(t)dt+ VX (t)dB(t), and X(0) = x > 0. Find the probability that 
the population doubles its initial size x before it becomes extinct. Show that 


374 CHAPTER 13. APPLICATIONS IN BIOLOGY 


when the initial population size x is small then this probability is approxi- 
mately 1/2, but when <z is large this probability is nearly one. 


Exercise 13.4: Check that the distribution in (13.16) in the Wright-Fisher 
diffusion is stationary. 


Exercise 13.5: Let a deterministic growth model be given by the differential 
equation 
dx(t) = g(a(t))dt, zo >0, t>0, 


for a positive function g(x) and consider its stochastic analogue 
dX(t) = g(X(t))dt + o(X(t))dB(t), X(0) > 0. 


One way to analyze the stochastic equation is by comparison with the deter- 
ministic solution. 


1. Find G(x) such that G(a(t)) = G(x(0)) +t and consider Y(t) = G(X (t)). 
Find dY (t). 


2. Let g(x) = 2", 0 <r < 1, and o(x)/xr” > 0 as x > œ. Give conditions 
on g(x) and gø? (x) so that the Law of Large Numbers holds for Y (t), that 
is, Y(t)/t — 1, as t > œ on the set {Y (t) — oo}. 


A systematic analysis of this model is given in Keller et al. (1987). 

Exercise 13.6: Let Lı, L2,..., Ly be independent exponentially distributed 
random variables with parameter a(x), (they represent the lifespan of particles 
in the population of size x). Show that min(L1, L2, ..., Lz) has an exponential 


distribution with parameter xa(x). (In a branching model the change in the 
population size occurs when a particle dies, that is, at min(L1, Lo,...,L2)). 


Exercise 13.7: (Birth-Death processes stochastic representation) 
Let NA(t) and NË (t), k > 1, be two independent sequences of independent 
Poisson processes with rates A and u. Let X(0) > 0 and for t > 0 X(t) satisfies 


X(t CESID ) > k)dNÀ(s ey os ) > k)d NË (s). 
0 k>1 0 k>1 
1. Show that X(t) is a Birth-Death process and identify the rates. 


2. Give the semimartingale decomposition for X(t). 


Chapter 14 


Applications in 
Engineering and Physics 


In this chapter methods of Stochastic Calculus are applied to the Filtering 
problem in Engineering and Random Oscillators in Physics. The Filtering 
problem consists of finding the best estimator of a signal when observations 
are contaminated by noise. For a number of classical equations of motions in 
Physics we find stationary densities when the motion is subjected to random 
excitations. 


14.1 Filtering 


The filtering problem is the problem of estimation of a signal contaminated 
by noise. Let Y(t) be the observation process, and F denote the information 
available by observing the process up to time t, that is FY = o(Y(s),s < t). 
The observation Y(t) at time t is the result of a deterministic transformation 
of the signal process X(s), s < t, typically a linear transformation, to which 
a random noise is added. The filtering problem is to find the “best” estimate 
x(t) of the signal X(t) on the basis of all the observations Y (s), s < t, or 
FY. The “best” is understood in the sense of the smallest estimation error 
E ((X(t) — Z(t)? |FY) , when Z(t) varies over all F/’-measurable processes. 
Denote for an adapted process h(t) 


milh) = E(A(t)|FY). (14.1) 
Then by Theorem 2.26 (see also Exercise 14.2) the filtering problem is in 
finding 7;(X). 
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Two main results of stochastic calculus are used to solve the filtering prob- 
lem, Levy’s characterization of Brownian motion and the predictable repre- 
sentation property of Brownian filtration. 


General Non-linear Filtering Model 


Let (Q, F,P) be a complete probability space, and let (F;),0 < t < T, be a 
non-decreasing family of right continuous o-algebras of F, satisfying the usual 
conditions, and supporting adapted processes X(t) and Y(t). 

We aim at finding 7;(h) for an adapted process h(t). In particular, we can 
find E(g(X (t))|F7 ), for a function of a real variable g. When g ranges over a 
set of test functions, the conditional distribution of X(t) given F} is obtained. 
Moreover by taking g(x) = x, we obtain the best estimator 7x (t). We assume 
that the process h(t), which we want to filter and the observation process Y (t) 
satisfy the SDEs 


dh(t) = H(t)dt+dM(t), 
dY (t) A(t)dt + B(Y (t))dW(t), (14.2) 


II 


where: 


The process M(t) is a F;-martingale, 


The process W(t) is a F;-Brownian motion, 


The processes H(t) and A(t) are random, satisfying with probability one 
Jo [H(Oldt < 00, fp |A(t)|dt < o, 


supper E(h?(t)) < 00, fy EH?(t)dt < 00, ff EA?(t)dt < 00, 


The diffusion coefficient of the observation process Y(t) is a function 
B(y) of Y(t) only, and not X. B?(y) > C > 0, and B? satisfies Lipschitz 
and the linear growth conditions. 
Theorem 14.1 For each t,0<t<T, 


Tt (hA) TK milh)ri (A) 


dm(h) = ™(H)dt D dWt 14.3 
mi(h) = m(H)dt + (m(D) + AI ae), (14.3) 
where dW (t) = TOA is an FY -Brownian motion and D(t) = MWY) 
W(t) is called the innovation process. 
We outline the main ideas of the proof. By (14.2) 
t 
h(t) = h(0) +f H(s)ds + M(t) and (14.4) 
0 


E(A(t) | FY) = E(h(0)|FY)+E ([ H(s)ds | z) ds + E(M(t)| FY). 
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The difficulty in calculation of the conditional expectations given FY is that 
the processes, as well as the o-fields, both depend on t. 


Theorem 14.2 Let A(t) ie an Fi- yr process such that 
JE E(/A(@)|)dt < 00, and V(t) = fý A(s)ds. Then 


m(V)— i Ts(A)ds = E ( A(s)ds R - f 140) | FY )ds 


is a FY -martingale. 
PROOF: Let r <T be an FY -stopping time. By Theorem 7.17 it is enough 
to show that E(1,-(V)) = E(f Ts(A)ds 


E(n-(V)) = E(V(r)) by the law of double expectation 


T T 
= af A(s)ds) = | E(I(s < r)A(s))ds 


i; 
= | E(I(s < 7)m5(A))ds since I(s < T) is FY -measurable 
0 


= E( T "lA; 


Note that by the conditional version of Fubini’s theorem, 


E ( f ' A(s)ds o) = | “R(A(s) IG) ds. (14.5) 


Verification of the next result is straightforward and is left as an Exercise 14.4. 


Theorem 14.3 Let M(t) be a F;-martingale. Then m(M) is an FY -martingale. 
Corollary 14.4 


milh) = m(0) + [ 1s5(H)ds + M(t) + Mə(t) + M3(t), (14.6) 


ak a ) are eee null at zero; My(t) = E(h(X(0))|FY) — h(0), 
=E (fH s)ds | F; Y) ds- fo r,(H)ds, Mg(t) = E(M(t)| FY). 


Proor: Using (14.4), the first term E(h(X(0))|F} ) is a (Doob-Levy) mar- 
tingale. The second term is a martingale by Theorem 14.2, the third by the 
above Theorem 14.3. 


We want to use a representation of FY martingales. This is done by using the 
innovation process. 
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Theorem 14.5 The innovation process W(t) = fo Toyas is an FY- 
Brownian motion. Moreover, 


dY (t) = Ta (tdt + B(Y(t))dW(t). (14.7) 


ProorF: By using (14.2) 


t A(s) — S| FY 
Wit) = a tae +W(0). (14.8) 
For t > t’, write 
E(WA-WEJFX) = E(w) - Wey) 
t af Als) — E(A(S)F2) 
+f E( BYG) | Fi as. 


The rhs is zero, the first term by Theorem 14.3, and the second by the con- 
ditional version of Fubini’s theorem. Thus W(t) is an FY -martingale. It is 
clearly continuous. It follows from (14.8) that [W,W](t) = [W, W](t) = t. By 
Levy’s characterization Theorem the claim follows. 


Now, if = 
ea ie (14.9) 


then conditional expectations given FY are the same given FW , and Theorem 
8.35 on representation of martingales with respect to a Brownian filtration can 
be used for the martingales M;(t) in (14.6). Comment that for (14.9) to hold it 
is sufficient that the SDE for Y(t) (14.7) has unique weak solution. Theorem 
8.35 and its corollary (8.67) give that there are predictable processes g;(s), 


such that 
T STF 
M:(t) -f gi(s)dW(s), with g:(t) = a 


It is possible to show that 


mi(hA) > T(h) ae (A) 


gi(t) + ga(t) + ga(t) = m(D) + BYO) 4 


and the result follows from (14.6). For details see Liptser and Shiryaev (2001) 
p. 319-325. 
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Filtering of Diffusions 
Let (X(t), Y(t)), 0 < t < T, be a diffusion process with with respect to 
independent Brownian motions W;(t), i = 1,2; F = o(W1(s), Wo(s),s < t), 
and 

dX(t) = a(X(t))dt + b(X(t))dWi(t) 

dY(t) = A(X(t))dt+ B(Y (t))dWa(t). (14.10) 
Assume that the coefficients satisfy the Lipschitz condition, for any of the 
functions a, A, b, B, e.g. ja(x’) — a(x")| < K|a’ — x” |; and B?(y) > C > 0. 

Let h = h(X(#)). The function h is assumed to be twice continuously 


differentiable. We apply Theorem 14.1 to h(X(t)). By Itô’s formula, Theorem 
6.1, we have 


h(X(t)) = h(X(0)) + I Lh(X(s))ds + f h!(X(s))b(X(s))dWi(s), 
where 1 
Lh(x) = h'(x)a(x) + zP ŒW (2). 
By Theorem 14.1 


melh) = molh) + i. m(Lh)ds + i ee (14.11) 


—, _ f$ dY(s)—75(A)ds 
We = | aay 


The linear case can be solved and is given in the next section. 


Kalman-Bucy Filter 


Assume that the signal and the observation processes satisfy linear SDEs with 
time time-dependent non-random coefficients 


dX(t) = a(t) X (t)dt + b(t)dWi (t), (14.12) 
dY (t) = A(t)X (t)dt + B(t)dW2(t), (14.13) 
with two independent Brownian motions (W1, W2), and initial conditions X (0), 
Y(0). Due to linearity this case admits a closed form solution for the processes 
X and Y and also for the optimal estimator of X(t) given FY. In the linear 
case it is easy to solve the above SDEs and verify that the processes X(t), 


Y(t) are jointly Gaussian. It is convenient in this case to use notation X(t) = 
m™(X) = E(X(t)|F/'). 
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Theorem 14.6 Suppose that the signal X(t) and the observation Y(t) are 
given by (14.12) and (14.13). Then the best estimator X(t) = E(X(t)|F}) 
satisfies the following SDE 

A? (t) 


dX(t) = (aw = Wwe) X (t)dt + v(t) an dY (t), (14.14) 


where v(t) = E(X (t) — X(t))? is the squared estimation error. It satisfies the 
Riccati ordinary differential equation 


du(t) 


rs 2a(t)u(t) + b?(t) — 


(14.15) 


with initial conditions X(0) and v(0) = Var(X(0)) — So 


PROOF: Apply Theorem 14.1 with h(x) = x (see also equation (14.11)) to 
have 


dX(t) = aOR dtt 4 (met?) (me(t))*) (AY (t)—A(t)X(t)dt). (14.16) 


Note that ; d 

m(h?) — (me(h)) = E(X H -XPF ). 
Because the processes X and Y are jointly Gaussian, X (t) is the orthogonal 
projection, and X(t) — X(t) is orthogonal (uncorrelated) to Y (s), s < t. But if 
Gaussian random variables are uncorrelated, they are independent (see Theo- 


rem 2.19). Thus X(t) — X(t) is independent of FY and we have that v(t) is 
deterministic, 


u(t) = E((X() - XO)? AY) = E(X (6) - X(0)?. 


The initial value v(0) is obtained by the Theorem 2.25 on Normal correlation, 
as stated in the Theorem. Let 6(t) = X(t) — X(t). Then v(t) = E(6?(t)). The 
SDE for 5?(t) is obtained from SDEs (14.12) and (14.16) as follows 


A(t)u(t) 
B(t) 


_ A(t) 
B?(t) 


d5(t) = a(t)5(t)dt + b(t)dW; (t) 5(t)dt — dW2(t). 


Applying now It6’s formula to 6?(t), we find 


dô?’ (t) = (2ft) = O + (07) + AO) at 


Altel) 
Bit) 


+26(t) (vaw, (t) — dW(t)) (14.17) 
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Writing (14.17) in integral form and taking expectation 


i em I (2(a(s) - TON ols + (0%(s) + Se) as 
ae f (2a(s) + (s) - TOs, (14.18) 


which establishes (14.15). 


The multi-dimensional case is similar, with the only difference being that 
the equation (14.15) is the matrix Riccati equation for the estimation error 
covariance matrix. 

The Kalman-Bucy filter allows on-line implementation of the above equa- 
tions, which are used recursively to compute x (t + At) from the previous 
values of X(t) and v(t). 


Example 14.1: (Model with constant coefficients) Consider the case of constant 
coefficients, 


In this case 


dX(t) = (a — v(t)c”) X(t)dt + cv(t)dY (t), (14.19) 
and v(t) satisfies the Riccati equation 
aut) = 2av(t) +1— v(t). (14.20) 


This equation has an explicit solution: 


yae™ +6 


t) = ——_ 14.21 
v) = =F. (14.21) 


where a and ĝ are the roots of 1 + 2ax — c?a?, assumed to be a > 0, B < 0, 
A= 2 (a— b) , and y = (o° — B)/(a— o°), with o? = Var(X(0)). Using v(t) above 
the optimal estimator X(t) is found from (14.19). 


For more general results see, for example, Liptser and Shiryayev (2001), 
Rogers and Williams (1990), Oksendal (1995). There is a vast amount of 
literature on filtering, see, for example, Kallianpur G. (1980), Krishnan V. 
(1984) and references therein. 
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14.2 Random Oscillators 


Second order differential equations 
&+ h(x, t) = 0. 


are used to describe variety of physical phenomena, and oscillations are one of 
them. 


Example 14.2: (Harmonic oscillator) 

The autonomous vibrating system is governed by +x = 0, x(0) = 0, (0) = 1, where 
a(t) denotes the displacement from the static equilibrium position. It has the solution 
a(t) = sin(t). The trajectories of this system in the phase space zı = z, x2 = & are 
closed circles. 


Example 14.3: (Pendulum) 

The undampened pendulum is governed by % + asin x = 0, where x(t) denotes the 
angular displacement from the equilibrium. Its solution cannot be obtained in terms 
of elementary functions, but can be given in the phase plane, (2)? = 2a cosx + C. 


Example 14.4: (Van der Pol oscillator) 
In some systems the large oscillations are dampened, whereas small ones are boosted 
(negative damping). Such motion is governed by the Van der Pol equation 


ë—a(l— x’ )t+r=0, (14.22) 


where a > 0. Its solution cannot be obtained in terms of elementary functions, even 
in the phase plane. 


Example 14.5: (Rayleigh oscillator) 


ë — a(l — t?’ )t +x = 0, (14.23) 


where a > 0. 
We consider random excitations of such systems by White noise (in applied 


language) of the form ys fi(x,4)W;(t), where W;(t), i = 1,2, are White 
noises with delta-type correlation functions 


E (wawe 4: 7)) = 2r Kyô(r)dt. 
Thus the randomly perturbed equation has the form 


X + h(a, X) = ye ele (t). 


The white noise is formally the derivative of Brownian motion. Since Brownian 
motion is nowhere differentiable, the above system has only a formal meaning. 
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A rigorous meaning to such equations is given by a system of first order Itô 
stochastic equations (a single vector valued equation). The representation as a 
system of two first order equations follows the same idea as in the deterministic 
case by letting x; = x and z2 = t. Using Theorem 5.20 on conversion of 
Stratanovich SDEs into It6 SDEs, we end up with the following It6 system of 
stochastic differential equations 


dX; = Xodt, 


o 
dX, = — (Xa, Xo) +1 > Kefi (Xa, X2)S*(M, Xa) dt + 


j,k 2 
2 
XO fi(X1, X2)dW;(¢). 
i=0 


The above system is a two-dimensional diffusion, and it has the same generator 
as the system below driven by a single Brownian motion B(t), 


dX, = Xodt, 
dX, = A(X1,X2)dt + G(X1, X2)dB(t), (14.24) 
where 
Of, (a1, £ 
A(zı, z2) = —h(a1, x2) + >> Kok fila, pp) PRG) 
7 2 
j,k 
1/2 


G(a1,2) = (2m)? | X. Kjrfi(®1, £2) fe(#1, £2) 


j,k 


The system (14.24) is a rigorous mathematical model of random oscillators, 
and this form is the starting point of our analysis. Solutions to the result- 
ing Fokker-Planck equations give densities of invariant measures or station- 
ary distributions for such random systems. The corresponding Fokker-Planck 
equation has the form 


o 1 0? 


o 
T2 =— Ps + (Aps) — 2 9x2 (G?ps) =0, 
2 


Ox, Ore 
where ps(%1,22) is the density of the invariant measure. When a solution 
to the Fokker-Planck equation has a finite integral, then it is a stationary 
distribution of the process. However, they may not exist, especially in systems 
with trajectories approaching infinity. In such cases invariant measures which 
are not probability distributions may exist and provide information about the 
underlying dynamics. 
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To illustrate, consider a stochastic differential equations of order two with 
no noise in & 


X +h(X, X) =oB. 
The corresponding Itô system is given by 
dX, = Xoedt, 
dXə = —h(X1, X2)dt + cdB(t). 


In the notation of Section 6.10, the dimension n = 2 and there is only 
one Brownian motion, d = 1, so that X(t) = (Xi(t), X2(t))T, b(X(t)) = 
T 


(b1 (X(t)), XAT = (X26), AXA), X2(t)))” o (X(t) = (0, a)". The 
diffusion matrix se 
a = oo? = 
[o e] 


hence the generator is given by 


2 2 


2 ð 1 ð? 
na digg, o r 


i=1 j=1 


o o 
S aee h(x1, x2) — + 


-0 —. 14.2 
xı xə 37 ôx? ( 5) 


II 


An important example is provided by linear equations 
X +aX +6X = 0B, (14.26) 


with constant coefficients. Solutions to these equations can be written in a 
general formula. 


X(t) = (exp(Ft)) (xo } f (exp(—Fs)(0,0)"aB(s)) 


where F = | = 5 and exp(Ft) stands for the matrix exponential (see, 
for example, Gard (1988)). 
Solutions to the Fokker-Plank equation corresponding to some linear sys- 


tems (14.26) are given in Bezen and Klebaner (1996). 


Non-linear Systems 


For some systems the Fokker-Plank equation can be solved by the method of 
detailed balance. We give examples of such systems, for details see Bezen and 
Klebaner (1996). In what follows Ko, Kı denote scaling constants. 
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Duffing Equation 

Consider additive stochastic perturbations of the Duffing equation: 
X+aX+X+bX3=W. 


a 
27 Ko 


1 
Ps(£1, £2) = exp (- (£3 + sori am d) . 


Typical phase portraits and densities of invariant measures are shown in Figure 
14.1. 


Random Oscillator 


Consider the following oscillator with additive noise 


X -a1 - X?’ -XX+ X=W. 


The density of the invariant measure is given by 


alzi + =) 


ce (- 4nKo 


It follows that when a = 0 the surface representing the invariant density is a 

plane. When a > 0 a fourth order surface with respect to 21,72 has maximum 

in a curve representing the limit cycle of the deterministic equation. 
Consider now the same equation with parametric noise of the form 


X —a(1— X? — X?)X + X = Wo + (X? + X? W. 


The density of the invariant measure is given by 


_ VKola+27K1) 
Ps = (Ko+2Kizis3 + Kisi + Kiso) 7 
2a arctan VEi (2 +22) 
exp ais 


Any Koky 


Typical phase portraits and densities of the invariant measures are shown in 
Figure 14.2. 


A System with a Cylindric Phase Plane 


Consider random perturbations to a system with a cylindric phase =r < x < 
T, TEO < i < W; oe . . 
X +aX +b+sin(X)= W. 
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a=1,b=-1,c=0,K0=1 


jj 
Ji 


/ 


a=1,b=-1,c=0.2,K0=1 


Figure 14.1: The Duffing equation. 
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additive noise, a=0.1,K0=1 additive noise, a=0.1,K0=1 


parametric noise, a=0.1,K0=1,K1=1 


Figure 14.2: Random oscillator. 
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Its invariant density is given by 


a(x} + 2bx1 — 2 cos =) 


Ps = exp (- InKo 


Typical phase portraits and densities of invariant measures are shown in Figure 
14.3. 

For applications of stochastic differential equations see, for example, Soong 
(1973). The Fokker-Planck equation was studied, for example, in Soize (1994). 


14.3 Exercises 


Exercise 14.1: Let X be a square integrable random variable. Show that 
the value of the constant c for which E(X —c)? is the smallest is given by EX. 


Exercise 14.2: Let X,Y be square integrable random variables. Show that 
E(X — E(X|Y))}? < E(X — Z)? 


for any FY -measurable random variable Z. Hint: show that X —E(X|Y) and 
Z are uncorrelated, and write X — Z = (X — E(X|Y)) + (E(X|Y) - Z). 


Exercise 14.3: Let M(t) be an F;-martingale and o-fields Ge C Fi. Show 
that if M(t) is G+-measurable, then it is a G,-martingale. 


Exercise 14.4: Let M(t) be an F;-martingale and o-fields G; C Fi. Show 
that M(t) = E(M(t)| G+) is a G-martingale. 


Exercise 14.5: Let W(t) be an F;-Brownian motion, and Ge C Fi. By the 
Exercise 14.4 W(t) = E(W(t)| G+) is a G,-martingale. Give an example of G+, 
such that W(t) is not a Brownian motion. 


Exercise 14.6: (Observation of a constant) 
Let the signal be a constant X(t) = c, for all t, and the observation process 
satisfy dY (t) = X(t)dt + dW (t). Give the Kalman-Bucy filter and find X(t). 


Exercise 14.7: (Observation of Brownian motion) 

Let the signal be a Brownian motion X(t) = Wj(t), and the observation 
process satisfy dY (t) = X(t)dt + dW2(t). Give the Kalman-Bucy filter and 
find X(t). 


Exercise 14.8: (Filtering of indirectly observed stock prices) 

Let the signal follows the Black-Scholes model X (t) = X (0) exp(oW, (t) + pt), 
and observation process satisfy dY (t) = X(t)dt + dW2(t). Give the Kalman- 
Bucy filter and find X(t). 
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Figure 14.3: The system with a cylindric phase plane. 
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Solutions to Selected 
Exercises 


Exercises to Chapter 3 


Exercise 3.1: X = u + AZ for the vector u and the vector of independent 
standard Normal random variables Z. For t = (t1,...,tn) 


E(e**¥) = E(e#(#+42)) = e*HE(e*l42)) = ett (elt) 2) = eH (tA), 


where y is the characteristic function of the vector Z. By independence 
2 


a. u? 
plu) = Ble) = P (eM) = Tye? = e7? 23", Hence 
y(tA) = eT? 2 DEL — e-s(tA)(tA)™ _ —-ZtAATt™ _ e-ti", Finally 


E(c#X) = eithe—gtbt” 


Exercise ne EX = tee cdF (x a bd i dtdF (x = dae is dF (x 
So (1 — F(t) dt. 


Exercise 3.3: If f(t) is non-increasing then [YS f(t)dt = >>, ox f(t)at 
< >o f(n). Now apply the previous result. 


Exercise 3.9: For x > 0, by using the distribution of M(t) = max,<; B(s) 
P(|B(t)| > x) = P(B(t) > «)+P(B(t) < —x) = 2P(B(t) > x) = P(M(t) > x). 


a2 
Exercise 3.10: By Theorem 3.18, E(T7) = f tf = l aih t"-3e-Fadt 


E ll = fo. st te Feds, The integral converges at a - any r. At zero 


it AA only for r + 4 <1. 


-(2y-7)? 


Exercise 3.11: fm(y = fB, ele y)dz = iim 2Cy He x dx 


(2y-«)? 
= 2 ats JY. de a = [2 e = 2fz(y), by (3.16). 
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Exercise 3.12: min,<; B(s) = — max;<;—B(s). Let W(t) = —B(t), then 
it is also a Brownian motion and we have P(B(t) > x, mins<; B(s) < y) = 
P(W(t) < —a,maxs<;W(s) > =) = 1 — (= agta), With ọ = ®’, for 
y < 0,£ > y, fBmlz,y) = o > s, m(t) < y) = 49 (ZF) = 


Exercise 3.13: B a >0. Let D = {(x,y) : y — x >a}. Then we have 
P(M() — BU) > a) = J Jo fem (z y)dedy = ° Joo fem (2, v)dedy 
= ie i T 3 ae Č Jedy = = VĒ ca ( (are a ca )dx)dy 
vaie “HE dy = 2 f age y= 


= 2P(B(t) >a). 


eee du 


Exercise 3.14: T = inf{t > 0 : B(t) = 0}. Since any interval (0, €) contains 
a zero of Brownian motion, Tə = 0, the second zero is also zero. 


Exercise 3.15: By the above argument, any zero of Brownian motion is a 
limit from the right of other zeroes. By the definition of T, it is a zero of X, 
but is not a limit of other zeroes. Thus X is not a Brownian motion. 


A ims tBU/H) _ tBU/H _ B(/t) 
Exercise 3.16: limsup;... => Loe anna 
Thus with 7 = 1/t, limsup,_,o = = 1. Similar for lim inf. 
Exercise 3.17: (B(e?%'), B(e?°2),..., B(e?*)) is a Gaussian vector. The 


finite-dimensional distributions of X(t) = e~° B(e?**) are obtained by mul- 
tiplying this vector by a non-random diagonal matrix A, with the diagonal 
elements (e741, e~@2,...,e7~°”). Therefore finite-dimensional distributions 
of X are multivariate Normal. The mean is zero. Let s < t 
Cov(X(s), X(t)) = e %e—%tB/( B(e2*) B(e2*)) = e—(stt) e2as = e~o(t=s) 
Note that this Gaussian process has correlated increments. 


Exercise 3.18: The process X(t) = e~°!B(e?*‘) has the given mean and 
covariance functions and is continuous. Since a Gaussian process is determined 
by these two functions this is the required version. 


Exercise 3.19: E(e%5"+1|9,,) = E(e"5»e%§"+1|S,,) = E(evS+1)E(e%5 |S.) = 
ec /2e%Sn_ Multiply both sides by e~("+)”"/2 for the martingale property. 


Exercise 3.21: If X; = 3, then Xə must be 1, implying that X3 = 2, and can- 


not be 3, so that P(X3 = 3|X2 = 1 or 2, Xı = 3) = 0. Using standard calcula- 
tions of conditional probabilities we can see that P(X3 = 3|X2 = 1 or 2) = 1/2. 
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Exercise 3.22: 1. If p = 1 then X(t) = aX(t—1)+2Z(t).P(X(t) € A|Fy_1) = 
P(aX(t — 1) + Z(t) € A|Fı-1) = P(aX(t—1) + Z(t) € A|X(t-1)). If p> 2 
then the distribution of X (t) depends on both X(t — 1) and X(t — 2). 

3. The one-step transition probability function is found by 
P(X(t) < ylX(t— 1) = z) = P(aX (t — 1) + Z(t) < ylX(t- 1) =2) 
= P(ax+ Z(t) < y|X(t-1) = x) = P(Z(t) < y— ar) = 6(®). The one-step 
transition probability density function is I(E) = cy (=), 


Exercises to Chapter 4 


Exercise 4.1: The necessary condition is fot —s)~?¢ds < oo. This holds if 
and only if a < 1/2. 


Exercise 4.2: By n X(t) = P; o iIi, tiz1](t) for t > 0. Hence 

t n 
Jo X(s)d = fo X(8)Io,y(s)4B(s) = Erao &i(Bléits A t) — Blti A t)), 
where t; A 2 = ER a Itô integral is continuous as a sum of continuous 
functions. 


Exercise 4. 3: The first statement follows by using mnra Borer func- 
tions e#nt+ent’/2 converges implies that un —> u and o2 > o? > 0. If o? = 0, 
then the limit is a constant u, otherwise, the limit is N (u, 07). 

An Ito integral of a nonrandom function is a limit of approximating It6 in- 
tegrals of simple nonrandom functions X,,(t). Since X,,(t) takes finitely many 
nonrandom values, if n(t)dB(t) has a Normal Po with mean zero 
and variance J X?(t)dt. By the first statement ox (t)dB(t) has Normal 


distribution with mean zero and variance i X?(t)dt. An alternative deriva- 
tion of eee i. is done by using the martingale exponential of the martingale 


uf, X 


Exercise 4.5: M(T) has a Normal distribution, EM?(T) < co. By Jensen’s 
inequality for conditional expectation (p. 45) with g(x) = x°, 

B(M2(T)|F,) > (E(M(T)|F,))? = M?(t). Therefore E(M?()) < E(M?(T)) 
and M(t) is a square-integrable martingale. The covariance between M (s) and 
M(t) — M(s) for s < t is zero, because by the martingale property E(M (t) — 
M(s)|F,) = 0 and E(M(s)(M(t) — M(s)) = EE(M(s)(M(t) — M(s))|Fs) 


= B(M(s)E(M(t) z M(s)|Fs)) = 0. Now, if jointly Gaussian variables are 
uncorrelated, they are independent. Thus the increment M(t) — M(s) is inde- 
pendent of 1 (s). 

Let M(t) = fý X(s)dB(s). Since X(s) is nonrandom, E ff X?(s)ds = 
EX o < œ by A Thus the Itô integral M(t) is a martingale. It 


394 SOLUTIONS TO SELECTED EXERCISES 


is also Gaussian by the Exercise 4.3. Thus it is a square-integrable martingale 
with independent increments. 


Exercise 4.6: Take f(x) = x”, then Itô’s formula gives 
dX?(t) = 2X (t)dX (t)+d[X, X](t), [X, X](t) = X? (t)—-X?(0) )-2 fr X(s)dX (s). 


Exercise 4.7: Let f(x) = vo, then Itô’s formula gives 
ESES A) AY) of"(X dX, X](¢) 


Basa — SONON o?(X(t))dt. Rearranging we obtain 


d¥(t)=4 RQ + s] dt + dB(t). 


Exercise 4.10: Let f(x,t) =txe~*. Then if z = B(t), we have X(t)/Y(t) = 
f(B(), t). Using Itd’s formula (with partial derivatives denoted by subscripts) 
df(B(t),t) = fe(B(t), t)dB(t) + 5 fea(B(), t)a[B, B](t) + fe(B(t), that 

t(1— B(t))e" FM dB(t) + (BE) — 2)e~ BO dt + Bite“ FB dt. 


Exercise 4.13: E|M(t)| = E|B? (t) — 3tB(t)| < E|B3(t)| + 3t|B(t)| < o, 
since a Normal distribution has all moments. Use the expansion (a + b)? = 
a3 +3a7b+3ab?+b? with decomposition B(t+s) = B(t)+(B(t+s)—B(t)). Take 
a= B(t), b= B(t+s)—B(t), and use the the fact that E(B(t+s)—B(t))? = 0, 
E(B(t + s) — B(t))? = s, to get the martingale property. Using It6’s formula 
dM (t) = d(B? (t) — 3tB(t)) = B )dB(t) + 46B? (t)dt — 3B(t)dt — 3tdB(t) 
= (3B? (t) — 3t)dB(t). Since KE (3B? (t) — 3t)?dt < oo, M(t) is a martingale 
on [0,7] for any T. 


Exercise 4.15: df(Bi(t),...,Bn(t)) =} ŻL(B JdB:(t) + +5; SF oe “£(B)dt 
=Vf-dBt+ 4V -Vfdt, where “-” is the scalar product of vectors and 

B = (Bı (t), ..., B,(t)) with dB = (dB, (t),...,dBy(t)). The operator V -V = 
A=; 2 is the Laplacian, so that the Itô’s formula becomes 

df(B(t)) = 4Af(B(t))dt + Vf - dB. 


T) = A T 


Beep) = 6 (peg) abe = -råprólgi), wing d'le) = lx) and 
or 


Exercise 4.16: Denote ¢(x) = ®/(x) then 0 


Zl) =-—} z) wp: Thus by Itô’s formula for all 0 < t< T, 
B B(s 
dD (F) = oA) pea Bt) and D(F) = 54 fo AAEL) B(s). 


Since ¢( Fe) TTS < F: the Itô integral above is a martingale. Thus for 


allt <T and s < t, X(t) = P(A) satisfies E(X (t)|F:) = X (s). 
Next, ast > T, X(t) > Y = I(B(T) > 0)+4I(B(T) = 0). The martingale 


property holds also for t = T by dominated convergence. 
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Exercise 4.18: dX(t) = tdB(t) + B(t)dt, d[X, X](t) = (dX(t))? = tdt, 
= fe s?ds = t? /3. 


2 4.19: X(t) =tB(t )-f5 sdB(s aif sdB(s (s)+ fo B (s)ds— fo sdB(s) 
= =f) B s)ds. Thus X(t) is differentiable T of finite ee Hence 

[X, a ) = 0. Itô integrals of the form te h(s)dB(s) have a positive quadratic 
variation. In this case the Itô integral is of the form i h(t, s)dB(s). 


Exercises to Chapter 5 


Exercise 5.1: By re 4.5, Jo s)dB(s) is a Gaussian process. 
= fjal s)ds + Jo W(s dB(s) is also Gaussian as Jo a s)ds is non-random. 


Exercise 5.3: dX(t) = = )(B(t)dt + B(t)dB(t)). X(t) = E(R)(t), where 
= f? B(s)ds+ f? B(s)dB(s). Thus X(t) = elo 80-2" ast J BOAR), 


Exercise 5.4: Let dM(t) = B(t)dB(t). Then dX(t) = X(t)dt + dM (t), 
which is a Langevin type SDE. Solving Similarly to Example 5.6 we obtain 
X(t) =e™( (1+ fs e°dM(s)) = e~*( (1+ fre e° B(s)dB(s)). 

The SDE is not of the diffusion type as o(t) = B(t). By introducing Y(t) = 
B(t), it is a diffusion in two dimensions. 


Exercise 5.5: Let U(t) = €(B)(t). Then dU(t) = U(t)dB(t). dU?(t) = 
U(t)dU(t) + d{U, U](t) = 2U?(t)dB(t) + U?(t)dt = U?(t)(2B(t) + dt). So that 

U?(t) = €(2B(t) + t). 

Exercise 5.6: dX(t) = X(t)(X(t)dt + dB(t)). If dY(t) = X(t)dt + dB(t) 

then X(t) = E(Y)(t). 

Exercise 5.10: By definition, P(y,t,x,s) = P(B(t) +t 

= P(B(t)— B(s)+t—s < y—a2|B(s)+s=2) = P(B(t 

by independence of increments. B(t) — B(s) has N(0 

Ply,t,x,s) = ® fa . 


< y|B(s) +s = x) 
Sa E +t-s<y-2) 
distribution, and 


EE 
Exercise 5.12: X(t z VX(s) +1 dB(s). ne it is a martingale 
EX(t) = 0, EX2(t) = E al VX) + 14B(s)) =E (J (X( )+ s) + 1)ds) =$ 
de"X® = ue" X OdX (t) + Sure" Od[X, X](t). d| X, X](t) = a )+1)dt, and 
de"X = ue“ ®© /X (Œ) + 1dB(t) + gu2e"* (X(t) + 1)dt. Taking expecta- 
tion m(t) = 1+4wE( fo e”X (s) (X (s)+ 1)ds) = = 1+hu? f E(e uX (s) X (s ))ds+ 


2 


bu? (J E(e"*) )ds. Thus 92 = SE(e"XO X(t) + SEe"* and the de- 
sired PDE follows. 
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Exercises to Chapter 6 


Exercise 6.1: Denote L = 1o and f(x,t) = ev*-“"t/2, Then Lf = su’ f 
and af = —$u’f. Thus Lf + SE = 0. By Ito’s formula and Corollary 6.4 
FBO,t) = e"B(t)—u*t/2 is a martingale. Since f solves Lf + oe = 0 for any 
fixed u, take partial derivative 2 and interchange the order of differentiation 
ie have that for any fixed u, &Lf + Z% = L(3L) + 29f = 0, so that 
au Í solves the backward equation for any fixed u and in particular for u = 0. 
Calculating the derivatives we get the result. 
Exercise 6.2: The po Tao is given by (6.30) L = 2 a —a#t. The 
backward equation is > ot = asl = of The fundamental solution is given 
by the probability density of the solution to the SDE X(t). 
p(t, z, y) = ẸP(t, x,y), where P(t, x,y) = P(X(t) < y|X(0) = 2). By (5.13) 
P(t, x,y) = P(xe™®t + e7% i e%™dB(s) < y) = PG, ed B(s) < ye% — x) 


=o (An) since f e°dB(s) has N(0, 4(e** — 1)) distribution. Thus 


p(t, x,y) = E P(t, x, y=¢ (Aa) ya with ¢@ denoting the density 
of N(0,1). 


Exercise 6.4: L = 


by Theorem 6.11 X? 
is a martingale. 


Ta Nie 


jee + orig Take f(x) = x”, then Lf(x) = 1+ 2cx? and 
t) — fo(Lf + $£)(X(s))ds = X?(t) — t — 2c fi X?(s)ds 


Exercise 6.6: Using Corollary 6.4 or Itô’s formula, we have df(B(t) + t) = 
f'(B(t) + tdB(t) + f'(B(t) + t)dt + 4f”(B(t) + t)dt. A necessary condition 
for f(B(t) + t) to be a martingale is that dt term is zero. This gives f'(x) + 
4f" (x) =0. For example, take f(a) = e~?" and check directly that e~? 8+ 
is a martingale. (e7? B+) = €(2B)(t)). 


Exercise 6.7: df(X(t),t) = 2£dx(t) + Sha[x, X](t) + 26dt. The term 
with dB(t) is given by o(X(t),t)94(X(t), t)dB(t). Thus the PDE for f is 


a(x, t)3E (x,t) al 

PRETERE 6.8: By letting v’ = y we obtain a first order differential equation 
y' + y + & = 0. Use the B factor V(x) = exp ( k at) ds), to 
otai (yV =-3V and y = -4 f SV. The result follows. 

Exercise 6.9: The scale function (6.50) S’(x) = e` fe en at? gives 


2 


S — Ce r P(T T, S(x)— Sla) _ -23r —=ba 5b -2a 
(2) = Ce“. P,(T, < Ta) = SAER = (cH? Ht) (6° O_o), 
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Exercise 6.10: The generator of Brownian motion is Lr, By Theorem 
6.6 f(z,t) = E(B?(T)|B(t) = x) = E(x + B(T) — B) |B) = a: By 
independence of increments it is the same as E(x + B(T) — B(t))? = £? +T -—t. 


Exercise 6.12: Check condition 2, Theorem 6.23 to decide convergence of the 


integral f° exp (- J or ds) Gi sry exp( P BScds)dy) de. f ai= 


2 221, When 3-— 2a > 0 then exp(—z?- ela oxy le") Liy~S 


o2 3-20 oye 

3-2a 
as x — oo and fi exp(—a3-?/o7) fE Lew dyde converges. Gerad 
quently, the process explodes. When 3— 2a < 0 then the integral diverges and 
there is no explosion. The case a = 3/2 needs further analysis. 


Exercise 6.13: Check conditions of Theorem 6.28. u(x) = 0, o(x) = 1. h = 
JES, ldu = œ, In = J ldu = œ. Hence B(t) is recurrent. For the process 


B(t)+t, w(x) = 1 and a(x) = 1. I = Jo exp (-2(u-xo))du = $< œ. Thus 
B(t) + t is transient. The Ornstein-Uhlenbeck process is left to the reader. 


Exercise 6.15: For n = 2, S(x) = ln x. So that for 0 < y < x, Pa (Ty < T) = 


Po mab, Since X(t) does not explode, Ta T co as b f co. Therefore 


Po(Ty < 00) = limps Pe(Ty < Tp) = limpsoo BERR = 1. Forn > 3, 


_gi-n/2 1l-n/2_,.1—n/2 ee 
S(x) = Er Ps (Ty < Th) = = eed _- (4) /2-1 <1. The explo- 
sion test shows that X(t) does not explode, and limp... To = 00 


Exercise 6.17: Check (6.67). Since 1 = JË v(t, a, y)dy = JË p(t, y, x)dy, for 


any C, C = JË Cp(t,y,x)dy. Since C is a distribution on (a, 3), it must be 
the uniform density. 


Exercise 6.18: To classify 0 as a boundary, calculate L1, Lo and L3, Remark 
6.5, see also Theorem 6.29. u(x) = b(a— x), o?(x) = 07x. For constants C1, C2 
Ci J u7 2a/?° du < Ly < Co N u~2a/0° du, which shows that L converges if 
2ba/o? < 1. Thus 0 is a natural boundary iff 2ba/o? > 1. Lz < œ. L3 < œ 
if ba > 0. So if 0 < 2ba/o? < 1, 0 is a regular boundary point, and if ba < 0, 
0 is an absorbing boundary. 


Exercises to Chapter 7 


Exercise 7.1: G; C Fi. Let s < t. By the smoothing property of conditional 
expectation 3, p. 55 (double expectation), E(M(t)|G,) = EE(M(HI|Fs)|Gs) 
= E(M(s)|Gs) = M (s). 

Exercise 7.3: By convexity, property 6 p. 55 of conditional expectation, 
E(g(X (t) | Fs) > g(E(X (t)|Fs)). By the submartingale property E(X (t) |F.) > 
X (s). Since g is non-decreasing the result follows. 
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Exercise 7.4: Since square-integrable martingales are uniformly integrable 
(Corollary 7.8), there is Y such that M(t) = E(Y|F;). Y = lim: M(t). By 
Fatou’s lemma (p. 41) E(Y?) < liminf;..~ E(M?(t)) < œ. 


Exercise 7.5: Let r be the first time B(t) hits a or b when B(0) = z, 
a<a<b. 7 is finite by Theorem 3.13. By stopping martingale B(t) we find 
that P(B(r) = b) = (4—a)/(b—a), P(B(T) = a) = 1 — P(B(T) = b), Example 
7.6. Stopping M(t), EM(r A t) = M(0) = 2?, or EB?(r At) — E(T At) = 2? 
Take t = œ. TAt— 7, B?(T At) > B?(r), and by dominated convergence 
EB?(r) — E(t) = 2”. But EB?(r) = a?P(B(r) = a) + bP(B(T) = b) = 
a? =* +b? 2—4, Thus E(7) = EB?(r) — 2?, and the result follows. 


Exercise 7.6: Let 7 = inf{t : B(t)—t/2 = a or b}. As in the previous exercise, 
we obtain EM(r) = M(0) = e® or e*P(M(r) = e*) + e?(1 — P(M(r) = eè)) = 
e”. This gives P(B(r) — 7/2 = a) = P(M(r) = e°) = $=&. 


Exercise 7.8: When the game is fair p = q = 1/2, use formula (7.17) to see 
that u — 1. If p Æ q, use (7.20) to see that u — 1 when p < q, and u — (q/p)* 
when p >q. 


ee 7.12: b sign?(B(s))ds = T < oo. Thus X(t) is a martingale. 
[X, X]( =f sign?(B(s))ds = t. By Levy’s theorem X is a Brownian motion. 
Exercise 7.13: [M,M](t) = h e**ds = $(e*' — 1). Its inverse function is 


g(t) = 4 m(2t + 1). M(g (t)) is a Brownian motion by the DDS Theorem 7.37. 


Exercise 7.16: X(t) = X(0)+ A(t) + M(t) 0)+ Sul s)ds + fol s)dBs 
is a local martingale. Therefore A(t) = X(t)- — MO isa A ei as 
a difference of two local martingales. A(t) is continuous. By using Corollary 
7.30, A(t) has infinite variation unless it is a constant. Since A(t) is of finite 
variation it must be zero. The result follows. 


Hie 7.17: By = S formula 
HR (3B s) +24 ~£(B(s),s) )ds = fé £(B(s), s)dB(s). The rhs. 
is a ee local seid and the lhs. is of finite variation. This can 


only happen when the local martingale is a constant. Thus at = 0, and f is 
a constant in x, hence a function of t alone. 


Exercise fen 1 a af = 4B?(t) — $t, and B?(t) = 2Y(t) +t. Thus 
B(t) = ay E = . Therefore dY (t) = sign(B(t)) \/2Y (t) + tdB(t) 
or dY (t a n ), where dW (t) = sign(B(t))dB(t), W is a Brow- 
nian en ae sae follows by Theorem 5.11. Alternatively, one 
can prove it directly by calculating of moments of Y(t), un(t) = EY” (t) (by 


using Ito’s formula p(t) = ze- Jo (stin—2(s) + 2ftn—1(s))ds) and checking 
Carleman’s condition $` (zn)~!/" = 00, see Feller (1971). 
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Exercises to Chapter 8 


Exercise 8.1: Ir 72} (t) = Ijo,7.)(t) — [jo,7] (£) is a sum of two left-continuous 
adapted functions. Let T be a stopping time. Then X(t) = Ijo,,)(t) is adapted. 
{X(t) = 0} = {t > T} = UP {rt < t-1/n} € Fi, since {r < t-1/n} € 
Fi—i/n C Fi. See p. 54. 


Exercise 8.2: The left-continuous modification of the process H(t — 6 + €) is 
adapted and as € — 0 tends to H(t — ô). 


Exercise 8.4: Let X(t =f) H , then it is a local martingale as a 
stochastic Te a ae is a ae cre 
[X, X]( =o H?(s)d[M, M](s). Thus E[X, X](T) < œ. By Theorem 
7.35. Cae, implies X is a square integrable martingale. 
Exercise a E( (fo N —)dM(t)) = 0, so that Var( Gi. N(t—)dM(t)) = 
E( Jo N(t-)aM(t))? = T N*(t—)d[M, M](¢). [M, M](t) = KMC = 
1 1 
N(t), so ne ae N(t-)dM(t)) =E f, N?(t—)dN(t) = Bye N? (Ti—1)), 
where 7; ee: the time of the i-th jump of N. But N(7;) = i, so ae 
Var( fi N(t-)aM(t) = EEn 6-1)? = BDL G12) = BLN) 


= a = 3N?(1) + N(1))/6, since 7; < 1 is equivalent to i < NG), and 
Elok? = (2(n +1)? — 3(n +1)? +n + 1)/6. Alternatively fy N2(t-)dN(t) 
can be obtained by using formula (1.20) p. 12. Compute moments using the 
mgf m(s) = EeN O) = ele. m '(0) = = lene = 190" (0). = EN?(1) = 2, 
m® (0) = EN?(1) = 5, giving Var( fẹ N(t-)dM(t)) = 5/6. 


Exercise 8.6: 1. SAT and SVT are stopping times, because {SAT > t} = 
{S>t}nN{T >t}E Fn {SVT<t}={S<H} NA TIt} EF 

2. The events {S = T}, {S < T} and {S < T} are in Fg. 

3. FsN{S <T} C FrAN{S <T}. See p. 53 Theorem 2.38. 


Exercise 8.9: M(t) is a continuous martingale, |M, M]( =f H?(s)ds = t. 
By Levy’s P it is a Brownian motion. If M(t) is a Ta aT 
then [M, M]( zuf H?(s)ds = t for all t. Taking derivatives H?(t) = 1 


Lebesgue a.s. 


Exercise 8.10: The proof is not easy if done from Ge eee But it is 
easy using Levy’s theorem. M(t) can be written as B(t =f dB(s), B(tAT) = 


fÉ To.r\(s)4B(s), M(t) = fe (2fio.n (s) — 1) dB(s ). > e A 
To,r\(s) is predictable. Clearly, M(t) is a continuous local martingale as an 
Itô o It is also a martingale, as E fo (22,7) (s) — 1)?°ds < 9t < œ. 
[M, M\( = f 2Iio, 7 (s 8) — 1)?ds. 
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If T >t, a for all s < t, Ior(s) = 1 I [M, M(t) =t. FT <t, 
then [M, M] (t =f (2Io,r](s) — 1) 2ds + [7 (—-1)%ds = T + (t— T) =t. Thus 
[M, M] (t) =t, a any T, and M is a Brownian motion by Levy’s theorem. 


Exercise 8.11: d(B(t)M(t)) = B(t)dM(t)+M(t)dB(t)+d[B, M](t). Since M 
is of finite variation, by the property 7 of quadratic variation (8.19) [B, M](t) = 
s< AB(s ae Z = 0, since a continuity of Brownian motion AB(s) = 


0. Thus B(t = f,B M(s) + i M(s)dB(s). Both stochastic inte- 
grals satisfy - w : a a martingale, i.e. EJS B?(s)d|M, M](s) = 
E fy B?(s)dN(s) = Ent B2(7;) = EDAP 7, < TEN(T) = T? < o. 
E f? M?(s)d[B, B\(s) = E f, M?(s)ds = f7 Var(N(s))ds = ff sds < ox. 
The other two e F are n by similar arguments, verifying that 


a purely discontinuous martingale is orthogonal to any continuous martingale, 
i.e. their product is a martingale, p. 233. 


Exercise 8.12: dX(t) = uX (t)dt + aX(t—)Adt + aX(t—)(dN(t) — Adt) + 

X(t)dB(t). aX (t—)(dN (t)—Adt)+0X (t)dB(t) = dM (t) is a martingale, as a 
sum of stochastic integrals with respect to martingales. f X(t—)dt = f X(t)dt 
Thus X(t) is a martingale when u = —a.. 


Exercise 8.13: B°(1) = fy (5B4(t) + 30(1 — t)B?(t) + 15(1 — t)?) dB(t). Use 
Ito’s formula for B°(t) and that the following functions z — 10ta? + 15t?2 and 
x? — 3xt produce martingales. 


Exercise 8.15: J, (sign(B(s)) — H(s))dB(s) = 0. The proposition follows 
from E f} (sign(B(s)) — H(s))?ds = 0. 

Exercise 8.17: We show the continuous case, the case with ae is similar. 
E(X)(t) = eX-21% X10, By (8.82) - ) = nE 3 fp el. By Ito’ 
formula for In U (t) we obtain [X, X]( =e Thus €(X a a 


Exercises to Chapter 9 


Exercise 9.1: Change the order of integration. 


Exercise 9.3: P(T, = x) = p(1—p)*"!, x = 1,2,.... Use the lack of memory 
property of Geometric distribution, P(T; < t + c|T; > t) = P(T; < c) to see 
that the compensator of N(t) is p[t], where [t] stands for the integer part of t. 


Exercise 9.4: Repeat the proof of Theorem 9.6 and condition on Fr, _. 


Exercise 9.5: Apply Theorem 9.18 to f(x) = x". 
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Exercises to Chapter 10 


Exercise 10.2: By Theorem 10.3 the equivalent measure Q is given by 
dQ/dP = e~#X+#7/2 and dP /dQ = e#*—#"/2, 

Exercise 10.3: Since Y > 0 and E(Y/EY) = 1, let A = Y/EY and define 
dQ/dP = A = eX—H- 0/2, By Theorem 10.4, the Q distribution of Y is 
N(u+0?,0?). Thus EYI(Y > K) =EYEAI(Y > K) =EYEgI(Y > K) = 
ett*/2Q(Y > K). Finally, EYI(Y > K) = e#*?°/20((u + 02/2 — K)/o), 
using 1 — (x) = ®(—2). 


Exercise 10.4: X(t) = B(t)+ ie cos sds. Thus by Girsanov’s Theorem 10.16 
ie T 
dQ/dP = e7 fo “8 24B(e)- 4 fy 208? sds, 


Exercise 10.5: With X = H - B, E(X) (T) = eX T) -X 0) -36 X12), 

Ja H-B]\(T) = Di \(Hi-Bi, HIB) = Ei _, Jo HEH (t)d|B*, BIE) 
ťa D (H'(t))?d = fy. |H(s)|?ds, since [B*, B’|(t) = 0 for i # j. Thus 

€(H - B)(T) = = Jo OAB) -4 TL, fp Hoya 


see H(s)dB(s s= f7 IH(8)|?ds 


Exercise 10.8: Since H(t) is bounded Ef. H?(t)d[N, N](t) =Ef, H?(t)dt 
is finite and M(t) is a martingale. By (9.5) 

E(M)(t) = elo HO TT 1 + ACH - M)(s))e-44 1), But the jumps 
of the integral occur at the points of jumps of N and AN = AN, so that 
A(H - N)(s) = H(s)AN(s). Next, [pcp 0 ADO = e7 Lose MANO) L 


e S HSAN), Proceeding, [J] ,<;(1+A(H - N)(s)) = edusct POHH(s))AN(s) = 
ak pe aS and the result follows. 
Exercise 10.9: Using (10.52) or (10.51) with pu; (2, t) = pix, i = 1,2, o(x, t) = 
ox, P corresponds to u1, B(t) is a P-Brownian motion. 

(2-144)? 


MX) = B = €(425.B)(T) = FBT Replace B(T) by its 


expression as a function of X. In this case B(T) = (In( 4) — (mı —40°)T)/0. 


Exercise 10.11: By Corollary 10.11 M’(t) is a Q-martingale if and only 
if M’(t)A(t) is a P-martingale. d(M’(t)A(t)) = M’(t)dA(t) + A(t)dM' (t) + 
d[M’, A(t) = M'(t)dA(t) + A(t)dM(t) — A(t)dA(t) + d[M’, A] (t). The first 
two terms are P(local) martingales, as stochastic integrals with respect to 
martingales (A(t) is a P-martingale). Thus —A(t)dA(t) + d[M’, A] (t) = 0. 
But [A, A](t) = 0, since A is continuous and finite variation. Thus dA(t) = 


d[M, Aj(t)/A(t). 
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Exercise 10.13: 

1. Using Girsanov’s Theorem 10.15, there exists an equivalent measure 
Q; (defined by A = dQ/dP = e~7/?-F(7)) such that W(t) = B(t) +t isa 
Q,-martingale. We show that under Q,, N(t) remains to be a Poisson process 
with rate 1. This follows by independence of N(t) and B(t) under P, 

Eget ® = Ep(Ae™™()) = Ep(e77/2-B(7) euN®) — Ep(e7/2-8())Ep(ewN®) 
= Ep(e”N®), since EpA = 1. This shows that one-dimensional distributions 
of N(t) are Poisson (1). Similarly, the increments of N (t) under Q, are Poisson 
and independent of the past. The statement follows. 


2. Let A = A(T) = Qe = MAs = e-P 2B Mg THN(T) In? 
=e ?T-2B(T)+N(T)n2. Then B(t)+2t is a Q.-Brownian motion, and N(t) is a 
Q-Poisson process with rate 2. To see this Eg(X) = Ep(AX) = Ep(^ 42X), 
for any X and EpA, = EpAg = 1. So under Q,, N(t) — 2t is a martingale, 
and B(t) + 2t is a martingale, thus X(t) = B(t) + N(t) is a Q.-martingale. 

3. The Je DrObab iy measures defined as follows for a > 0 are equivalent to 
P, dQ, = e- eT /2— aB(T)((1-a)T+N(T)Ina _ ,(1-a a? /2)T—aB(T)+N(T) Inagqp, 
and make the processes B(t) + at into a Q,-Brownian motion, and N (t) into 
a Qa-Poisson process with rate a. 


Exercises to Chapter 11 


Exercise 11.3: If X = ası +b for some a and b then Eg(X/r) = aEg(S1/r)+ 
b = aSo + b the same for any Q. Note that since the vectors S and 8 are not 
collinear, the representation of X by a portfolio (a, b) is unique. Take the claim 
that pays $1 when the stock goes up and nothing in any other case, then this 
claim is unattainable. Eg(X) = 1.5(0.2 + pa) depends on the choice of pg. 


Exercise 11.5: 1. Eg(M(t)|F;) = Ep(M(t)|Fs) = M(s) as. Take s = 0, 
EpM(t) = EgM(t) = M(0). 

2. If a claim is attainable, X = V(T), where V(t)/G(t) is a Q-martingale. 
Its price at time t is C(t) = V(t), which does not depend on the choice of Q. 
By the martingale property V(t) = G(t)EQg(V(T)/G(T)|F:). This shows that 
C(t) = B(tH)Eg(X/G(TL)|F;z) is the same for all Q’s. 


S(t) dS(t) 


Exercise 11.10: Consider continuous time. Toa = ws + S(t)d(g Ha): 
ae) = FO = GRASO — pte ghy). RO = fe SACRE) + ff MD. 


The first integral is a TAT 5 a oa ee aes Lespéct to a Q- 
martingale. Thus Eg R(t) =f a © , which is the risk-free return. In discrete 


time the statement easily follows from the martingale property of S(t)/((t). 


Exercise 11.11: Let X(t) = S(t)e~™. Then it is easy to see that X(t) 
satisfies the SDE dX(t) = oX(t)dB(t), for a Q-Brownian motion B(t). By 
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Itd’s formula d(xq) = -o yg (dB(t) — odt) = -o0 O) dW (t), with dW (t) = 
dB(t) — odt. W(t) is a Q-Brownian motion with drift. Using the change 
of measure dQ,/dQ = e?8()-°T/2, by Girsanov’s theorem W(t) is a Qy- 
Brownian motion. Finally, —W (t) is also a Q,-Brownian motion, and we can 
remove the minus sign in the SDE. 


Exercise 11.12: By the Theorem 11.7, C(t) = Ea, (5t SO {WOES HF) = 
StEg, ((1- Bory) Fe). The conditional distribution of UST ) given F; under 
Q; is obtained by using the SDE for 1/S(t). With Y(t) = 1/X (t) = e™*/ S(t), 
we have from above dY (t) = oY (t)dW (t) and 1/S(t) = Y(t)e™™*. Using 
the product rule d( 30) = E rdt + odW(t)). Thus 1/S(t) is a stochastic 
exponential, and 1/S(T) = (1/S(t))e(f@-9-r-"/2)+e¢W(D)-W)_ Thus given 
F, the Q; conditional distribution of 1/S(T) is Lognormal with mean and 
variance — In S(t) — (T — t)(r + 07/2) and o?(T — t). Using calculations for 
the E(1 — X)* similar to p. 313, we recover the Black-Scholes formula. 


Exercise 11.14: From the first equation b(t) = (V(t)—a(t)S(t))e~™. Putting 
it in the self-financing condition dV (t) = a(t)dS(t)+b(t)d(e™), we get the SDE 
for V(t). 

For the other direction, let b(t) be as above. Then V(t) = a(t)S(t)+b(t)e™ 
moreover the SDE for V(t) gives the self-financing condition. 


Exercise 11.17: Let F(y) = e~"E((Sr/S — y)*), then C = SF(K/S). 
Now, 0C/OS = F(K/S) — KF"(K/S)/S and 0C/0K = F'(K/S). Thus 
SOC/0S + KOC/0K = SF(K/S) = C. The expression for ge in the Black- 
Scholes model follows from the Black-Scholes formula. 


Exercises to Chapter 12 

Exercise 12.1: Use Theorem 12.5, C(t) = Eg( SO (P (s,T) — K)t| F) 
= Ee(S4P(s,T)I(P(s,T) > K)| Fr) — KEQ($S1(P(s,T) > K)| Fi). For 
the first term take Az based on the Q-martingale P(s,T)/G(s), s < T, which 
corresponds to numeraire P(t,T), or T-forward measure. Since the expecta- 


tion must be unity, A2 = TORT (Formula (12.53)). Then the first term is 


Eq (25088 XIF) = P(t,T)Eg,(X|F;). For the second term consider A, = 
1/(P(0, s)G(s)) based on the martingale P(t, s)/G(t), t < s, which corresponds 
to numeraire P(t, s), or s-forward measure. Then for any X, EQ (5X |Fi) = 
P(t, s)Eg, (X|F;). This gives for the second term K P(t, s)Qi(P(s,T) > K|F;), 
and the result follows. 
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Exercise 12.3: 1. Follows by E|X| < VEX?. 
2. By additivity of the integral ff Os H (ti, 8) (W(tis1) — W(ti))) ds = 
a (tas )ds) (W (titi) — W (t:)). 

3. Let Xn(s) = 0p H(t, 2 W(ti)). Sone Xn(s )i is an approx- 
imation to the Itô integral X (s =H (t, s)dW(t). EX? (s =f EH? (t, s)ds < 
oo, and under the stated ee KE (Xn(s)—- X (s ))? : converges to zero. 
This eee converges in L? and in probability of AX s)ds to X s)ds. 
The rhs >> =f Ue (ti, s )ds) (W (ti+1) — W(t)) is an re Itô 
integral of v(t ), and converges to it in probability. 


Exercise 12.5: Use the formula for the bond (12.32). f(t, T) = — om PGT) | 
A cap is a sum of caplets, which are priced by (12.64). 


Exercise 12.7: Let Y(t = fir oo ) and X(t) = h B(s)dB(s) for some 
B(s) to be determined. ae E(X)(t) =c+ (1—c)E(Y)(t). Thus 


OEO (1—o€(Y)(t) _ (1- EY) (0) 
xO FON) er O-oeMO ehU-oew 


+(1-— on c+(1—ce T TE y2 (s)ds ` 


Existence of forward rates volatilities follows from equation (12.76). 


B(t) = 


Exercise 12.9: At T; the following exchange is made: the amount received 
is fi-1(T; — T;-1) and paid out k(T; — T;-1). The resulting amount at time T; 
is 1/P(T, i— 1 Ti) —1— kô, using 1/P(T, f LT) = = 1+ fi-1 (Tı —Tj-1). Thus the 
value at time t of the swap is 


1 
Sapt = Xo Bo (ary (aera t) 1A) 


Using the martingale property of the discounted bonds (12.4), we obtain that 
Eg (AR |Fr-.) = P(T;-1,T;). The result is obtained by conditioning on 


Fr,_, in the above sum. 
Exercise 12.10: Follows from the previous exercise. 


Exercise 12.11: Consider the portfolio of bonds that at any time t < T is long 
a(t) of the To-bond, short a(t) of the T,-bond, and for each i, 1 <i<n 
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short a_(t)dk of the T;-bond. The value of this portfolio at time t < T is 
C(t) = a4(t)(P(t, To) — P(t, Tn)) — ka_(t)b(t), with b(t) = ô X; P(t, Ti). 
By using the expression for the swap rate C(t) = b(t) (a+ (t)k(t) — Ka_(t)). It 
can be seen that this portfolio has the correct final value and is self-financing, 
ie. dC(t) = a4 (t)d (P(t, To) — P(t, T,)) — ka_(t)db(t). 


Exercises to Chapter 13 


Exercise 13.1: Use Theorem 6.16 and formula (6.98) to find E(To A Ty). 
ETo = limp—oo E(To A Tp) by monotone convergence. 


Exercise 13.2: 1. d(e~°*) = —ce~°**dX(t) + Se eX (Md [X, X](t) = 
—ce~°X (“g./X(t)dB(t). By Theorem 13.1 X(t) > 0 for all t. The function 
e~°*,/z is bounded for x > 0. Thus e~°*) is martingale as an Itô integral of 
a bounded process, Theorem 4.7. 

2. Let rT = T At, then T is a bounded stopping time. Applying Optional 
stopping, we have e~°* = He~*) = Ee~X(7) = E(e~* I(r = T)) + 
B(e-* (eS t)) > Ble 0I (r =T))= P(T < t). 


Exercise 13.5: 1. Let G(x) = [7 du/g(u). Then G(x(t)) = G(x(0))+t. Now 


dY (t) = dt + ZW aB(t) - O a 


2. G(a) = x17" /(e(1 — r)) and Y(t a o + Jo ER EAB(s). 

By the growth condition, E aks AS an = fi E( (ae) 4ds < Ct. It 

follows E(4 if oA dB(s))? — 0, and ae 1 a, dB(s) — 0 in 
to lea r 


probability. It follows by the L’Hospital rule that + foz ds — 0 on the 


set {X(t) — co}. The LLN for Y (t) now follows. 


Exercise 13.6: P(min(L1, Lo,.. 


Ly) >t) = P(Li >t, L2 >t,..., Lo >t) 
= (P(Lı > t))? = (et) 2 — ora vt 


Exercise 13.7: 1. m(t) = ie ns I(X(s—) > k)dNp(s) has jumps of size 
one. Re that A} = f AX ds is its compensator. Similarly for the process 


sJ eer X(s—) > k)dNx(s). 
X(t 


a (t) = X (0) + A(t) — Ao(t) + mı (t) — A1 (t) — m2 (t) + A2 (t). 


Exercises to Chapter 14 
Exercise 14.3: E(M(t)|Gs) = E(E(M(t)|Fs)|Gs) = E(M(5)|gs) = M(s). 


Exercise 14.5: Let G; be the trivial o-field, then Wit) = E(W(t)| Gz) =0 
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Exercise 14.6: Using formally Theorem 14.6 with a(t) 
B(t) = 1, dX(t) = —v(t)X(t)dt + v(t)dY (t), and dv(t 
dX (t) = —(1/t)X(t)dt+(1/t)dY(t). We obtain v(t) = 

These can be obtained directly using Y(t) = ct + W(t). 


b(t) = 0 and A(t) = 
v(t )dt. Hence 


N=- 
1t and (= YOt 


Exercise 14.8: dX(t) = (u+07/2)X(t)dt + oX(t)dB(t). Apply Theorem 
14.6 with a(t) = u + 07/2, b(t) = o, A(t) = B(t) = 1, see Example 14.1. 
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